# KBDatalakeApps Pipeline Step-by-Step Testing

This notebook is a **test harness** for `KBDataLakeUtils` from
`lib/KBDatalakeApps/KBDatalakeUtils.py`. It mirrors how the KBase SDK impl file
works: the notebook environment reads token/config from files, then creates
a separate `KBDataLakeUtils` instance with those credentials injected.

**Architecture:**
- `util.py` (NotebookUtil) = test harness that reads token/config from files
- `KBDataLakeUtils` = the code under test, receives token/config as arguments
- Each cell creates a fresh `KBDataLakeUtils` instance (state lives on filesystem)

**Pipeline steps tested:**
1. Process input arguments into user genome table
2. Download genome assemblies
3. Download genome genes & annotations
4. Run SKANI analysis
5. Annotate genomes with RAST
6. Build metabolic models (single + parallel)
7. Run phenotype simulations
8. Build SQLite database
9. Save annotated genomes to KBase
10. Save models to KBase
11. Generate KBase report

## Step 0: Setup & Configuration

Configure the pipeline parameters and save them to datacache. Every subsequent
cell reloads this config and creates a fresh `KBDataLakeUtils` instance.

The notebook's `util` reads token/config from your standard files
(`~/.tokens`, `~/.kbase/token`, `~/.kbutillib/config.yaml`). The
`create_pipeline_utils()` helper then injects those into `KBDataLakeUtils`
while blocking it from reading files itself.

- **Edit**: `workspace_name`, `input_refs`, and optionally `pipeline_dir`
- **Output**: `pipeline_config` saved to datacache

In [None]:
%run util.py
import os

# ---- EDIT THESE PARAMETERS FOR YOUR TEST ----
workspace_name = 'chenry:narrative_1234567890'  # Your KBase workspace
input_refs = [
    # Add genome or genome set references here, e.g.:
    # '12345/6/7',
    # '12345/8/1',
]
worker_count = 4
pipeline_dir = os.path.join(util.data_dir, 'pipeline_test')
# ---- END PARAMETERS ----

parameters = {
    'input_refs': input_refs,
    'workspace_name': workspace_name,
    'suffix': '.datalake',
}

# Verify the notebook util has a token loaded
token = util.get_token('kbase')
print(f'KBase token loaded: {"yes" if token else "NO - check ~/.kbase/token or ~/.tokens"}')
print(f'Config keys: {list(util._config_hash.keys()) if util._config_hash else "none"}')

# Save configuration so other cells can reload it
util.save('pipeline_config', {
    'workspace_name': workspace_name,
    'input_refs': input_refs,
    'parameters': parameters,
    'pipeline_dir': pipeline_dir,
    'worker_count': worker_count,
})

# Quick smoke test: create a pipeline utils instance
os.makedirs(pipeline_dir, exist_ok=True)
pipeline = util.create_pipeline_utils(
    directory=pipeline_dir,
    workspace_name=workspace_name,
    parameters=parameters,
    worker_count=worker_count,
)
print(f'Pipeline directory: {pipeline.directory}')
print(f'Pipeline workspace: {pipeline.workspace_name}')
print(f'Pipeline token loaded: {"yes" if pipeline.get_token("kbase") else "NO"}')
print(f'\nSetup complete. Ready to test pipeline steps.')

## Step 1: Process Arguments into User Genome Table

Tests `pipeline_process_arguments_into_user_genome_table()`.

Translates the input reference list (genomes or genome sets) into a structured
metadata table saved as `user_genomes.tsv`.

- **Input**: `input_refs` from KBase workspace
- **Output**: `<pipeline_dir>/user_genomes.tsv`
- **Columns**: genome_id, species_name, taxonomy, genome_ref, assembly_ref,
  genome_type, genome_source_id, genome_source_name, num_contigs, num_proteins,
  num_noncoding_genes

In [None]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_process_arguments_into_user_genome_table()

# Inspect the output
output_path = os.path.join(config['pipeline_dir'], 'user_genomes.tsv')
if os.path.exists(output_path):
    df = pd.read_csv(output_path, sep='\t')
    print(f'\nGenomes table: {len(df)} rows, {len(df.columns)} columns')
    display(df)
else:
    print(f'Output file not created: {output_path}')

## Step 2: Download Genome Assemblies

Tests `pipeline_download_user_genome_assmemblies()`.

Downloads FASTA assembly files for all genomes listed in `user_genomes.tsv`.

- **Input**: `user_genomes.tsv` (assembly_ref column)
- **Output**: FASTA files in `<pipeline_dir>/assemblies/`
- **Requires**: KBase workspace access

In [None]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_download_user_genome_assmemblies()

# Inspect output
assemblies_dir = os.path.join(config['pipeline_dir'], 'assemblies')
if os.path.exists(assemblies_dir):
    files = os.listdir(assemblies_dir)
    print(f'\nAssembly files ({len(files)}):')
    for f in sorted(files):
        size_kb = os.path.getsize(os.path.join(assemblies_dir, f)) / 1024
        print(f'  {f}: {size_kb:.1f} KB')
else:
    print('No assemblies directory created')

## Step 3: Download Genome Genes & Annotations

Tests `pipeline_download_user_genome_genes()`.

Downloads genes, features, and existing annotations for each genome into
per-genome TSV files.

- **Input**: `user_genomes.tsv` (genome_ref column)
- **Output**: `<pipeline_dir>/genomes/<genome_id>.tsv` per genome
- **Columns**: gene_id, aliases, contig, start, end, strand, type, functions,
  protein_translation, dna_sequence, ontology_terms

In [None]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_download_user_genome_genes()

# Inspect output
genomes_dir = os.path.join(config['pipeline_dir'], 'genomes')
if os.path.exists(genomes_dir):
    files = [f for f in os.listdir(genomes_dir) if f.endswith('.tsv')]
    print(f'\nGenome gene files ({len(files)}):')
    for f in sorted(files):
        df = pd.read_csv(os.path.join(genomes_dir, f), sep='\t')
        print(f'  {f}: {len(df)} features')
    # Show sample from first genome
    if files:
        sample = pd.read_csv(os.path.join(genomes_dir, files[0]), sep='\t')
        print(f'\nSample from {files[0]}:')
        display(sample.head())
else:
    print('No genomes directory created')

## Step 4: Run SKANI Analysis

Tests `pipeline_run_skani_analysis()`.

Runs SKANI (fast genomic distance estimation) against three sketch databases:
pangenome, fitness, and phenotype.

- **Input**: FASTA files in `<pipeline_dir>/assemblies/`
- **Output**: TSV files in `<pipeline_dir>/skani/` (one per database)
- **Columns**: genome_id, reference_genome, ani_percentage
- **Requires**: SKANI sketch databases configured in kbutillib config

In [None]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_run_skani_analysis()

# Inspect output
skani_dir = os.path.join(config['pipeline_dir'], 'skani')
if os.path.exists(skani_dir):
    files = [f for f in os.listdir(skani_dir) if f.endswith('.tsv')]
    print(f'\nSKANI result files ({len(files)}):')
    for f in sorted(files):
        df = pd.read_csv(os.path.join(skani_dir, f), sep='\t')
        print(f'  {f}: {len(df)} hits')
        display(df.head())
else:
    print('No skani directory created')

## Step 5: Annotate Genomes with RAST

Tests `pipeline_annotate_user_genome_with_rast()`.

Submits protein sequences from each genome to RAST for functional annotation.
Translates RAST functions to SSO (Subsystem Ontology) terms and populates the
`Annotation:SSO` column in each genome TSV file. Skips genomes that already
have `Annotation:SSO` data (e.g., from the original KBase genome object).

- **Input**: Genome TSV files in `<pipeline_dir>/genomes/`
- **Output**: Updated genome TSV files with `Annotation:SSO` column
- **Format**: `SSO:nnnnn:description|rxn1,rxn2` entries separated by `;`
- **Requires**: RAST SDK service access

In [None]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    parameters=config['parameters'],
    kb_version=config.get('kb_version', 'dev'),
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_annotate_user_genome_with_rast()

# Inspect output - check that Annotation:SSO column was added/populated
genomes_dir = os.path.join(config['pipeline_dir'], 'genomes')
if os.path.exists(genomes_dir):
    files = [f for f in os.listdir(genomes_dir) if f.endswith('.tsv')]
    for f in sorted(files):
        df = pd.read_csv(os.path.join(genomes_dir, f), sep='\t')
        has_sso = 'Annotation:SSO' in df.columns
        annotated = df['Annotation:SSO'].fillna('').astype(str).str.strip().ne('').sum() if has_sso else 0
        print(f'{f}: {len(df)} features, Annotation:SSO={has_sso}, annotated={annotated}')
    # Show sample
    if files:
        sample = pd.read_csv(os.path.join(genomes_dir, files[0]), sep='\t')
        if 'Annotation:SSO' in sample.columns:
            print(f'\nSample annotations from {files[0]}:')
            display(sample[['gene_id', 'functions', 'Annotation:SSO']].head(10))

## Step 6a: Build Single Metabolic Model (Debug Mode)

Tests model building for **one** genome in the current process (no parallelism).
This calls the same core logic as `pipeline_run_moddeling_analysis()` but is
easier to debug. We manually replicate the worker logic here to allow
step-through inspection.

- **Input**: Genome TSV with `Annotation:SSO` column
- **Output**: COBRA JSON model in `<pipeline_dir>/models/<genome_id>_model.json`
- **Note**: Edit `test_genome_id` to pick which genome to test

In [1]:
%run util.py
import os
import cobra
from modelseedpy.core.msgenome import MSGenome, MSFeature
from modelseedpy import MSModelUtil

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    parameters=config['parameters'],
    kb_version=config.get('kb_version', 'dev'),
    worker_count=config['worker_count'],
)

# Pick a genome to test
genomes_file = os.path.join(config['pipeline_dir'], 'user_genomes.tsv')
user_genomes = pd.read_csv(genomes_file, sep='\t')
test_genome_id = user_genomes.iloc[0]['genome_id']  # change index to pick another
print(f'Building model for: {test_genome_id}')
print('=' * 60)

# Load features from genome TSV
genome_tsv = os.path.join(config['pipeline_dir'], 'genomes', f'{test_genome_id}.tsv')
gene_df = pd.read_csv(genome_tsv, sep='\t')

safe_id = test_genome_id.replace('.', '_')
models_dir = os.path.join(config['pipeline_dir'], 'models')
os.makedirs(models_dir, exist_ok=True)

# Create MSGenome from features
genome = MSGenome()
genome.id = safe_id
genome.scientific_name = test_genome_id

ms_features = []
for _, gene in gene_df.iterrows():
    protein = gene.get('protein_translation', '')
    gene_id = gene.get('gene_id', '')
    if pd.notna(protein) and protein:
        feature = MSFeature(gene_id, str(protein))
        # Parse Annotation:SSO column
        # Format: SSO:nnnnn:description|rxn1,rxn2;SSO:mmmmm:desc2|rxn3
        sso_col = gene.get('Annotation:SSO', '')
        if pd.notna(sso_col) and sso_col:
            for entry in str(sso_col).split(';'):
                entry = entry.strip()
                if not entry:
                    continue
                term_part = entry.split('|')[0]
                parts = term_part.split(':')
                if len(parts) >= 2 and parts[0] == 'SSO':
                    sso_id = parts[0] + ':' + parts[1]
                    feature.add_ontology_term('SSO', sso_id)
                    # Extract description for classifier
                    if len(parts) >= 3:
                        description = ':'.join(parts[2:])
                        if description:
                            feature.add_ontology_term('RAST', description)
        ms_features.append(feature)

genome.add_features(ms_features)
print(f'MSGenome: {len(ms_features)} protein features')

# Build the model using pipeline's reconstruction utils
genome_classifier = pipeline.get_classifier()
build_output, mdlutl = pipeline.build_metabolic_model(
    genome=genome,
    genome_classifier=genome_classifier,
    model_id=safe_id,
    model_name=test_genome_id,
    gs_template='auto',
    atp_safe=True,
    load_default_medias=True,
    max_gapfilling=10,
    gapfilling_delta=0,
)

if mdlutl is not None:
    model = mdlutl.model
    print(f'\nModel built: {model.id}')
    print(f'  Reactions: {len(model.reactions)}')
    print(f'  Metabolites: {len(model.metabolites)}')
    print(f'  Genes: {len(model.genes)}')
    print(f'  Class: {build_output.get("Class", "N/A")}')
    print(f'  Core GF: {build_output.get("Core GF", 0)}')

    # Gapfill on Carbon-Pyruvic-Acid
    gapfill_media = pipeline.get_media('KBaseMedia/Carbon-Pyruvic-Acid')
    gf_output, _, _, _ = pipeline.gapfill_metabolic_model(
        mdlutl=mdlutl,
        genome=genome,
        media_objs=[gapfill_media],
        templates=[model.template],
        atp_safe=True,
        objective='bio1',
        minimum_objective=0.01,
        gapfilling_mode='Sequential',
    )
    print(f'  GS GF: {gf_output.get("GS GF", 0)}')
    print(f'  Growth: {gf_output.get("Growth", "N/A")}')

    # Save model
    model_path = os.path.join(models_dir, f'{safe_id}_model.json')
    cobra.io.save_json_model(model, model_path)
    print(f'  Saved: {model_path}')
else:
    print(f'\nModel build returned None: {build_output}')

/Users/chenry/Dropbox/Projects/KBUtilLib/src


2026-02-03 13:35:28,458 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-03 13:35:28,459 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-03 13:35:28,460 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2026-02-03 13:35:28,461 - __main__.NotebookUtil - INFO - Notebook name: test_pipeline_steps
2026-02-03 13:35:28,461 - __main__.NotebookUtil - INFO - Notebook environment detected


modelseedpy 0.4.2


2026-02-03 13:35:28,824 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - Using directly provided configuration dictionary


loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


2026-02-03 13:35:33,725 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
2026-02-03 13:35:34,106 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - CRITICAL - KBase version not set up for modeling!
2026-02-03 13:35:34,107 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - SKANI database cache: /Users/chenry/.kbutillib/skani_databases.json
  Via conda: conda install -c bioconda skani
  Via cargo: cargo install skani
  From source: https://github.com/bluenote-1577/skani
  Or set 'skani.executable' in config.yaml to the full path


cobrakbase 0.4.0
Building model for: Test3
MSGenome: 4528 protein features


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


N


INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd11632 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08701 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00187 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00425 not found in model!
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:22.625000000000107; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:26.249999999999517; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:3.374999999999912; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.3333333333333362; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.0; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.3333333333333344; min objec

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01241_c0
INFO:modelseedpy.core.msmodelutl:Akg.O2/rxn00062_c0:rxn01241_c0> needed:-1.6507085360670924e-15 with min obj:2
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee610>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee610>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:3.374999999999912; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8f5c70>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False,

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn13974_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn17445_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn02480_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03085_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn03079_c0< needed:-7.309900651889522e-16 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:mod

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01187_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01241_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msmodelutl:Akg/rxn00062_c0:rxn01187_c0> needed:-1.9606045221934896e-15 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Akg/rxn00062_c0:rxn01241_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:rxn00173_c0< not needed:0.5000000000000009
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f9573d0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn01187_c0': '>', 'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f9573d0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn01187_c0': '>', 'rxn01241_c0': '>'}, 'rever

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:2.874999999999947; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f940>, 'target': 'rxn00062_c0', 'minobjective': 1.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f940>, 'target': 'rxn00062_c0', 'minobjective': 1.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.0000000000000016; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn13974_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15962_c0
INFO:mo

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f4f0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f4f0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:4.999999999999935; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f1f0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f94f1f0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msmodelutl:Ac.NO2/rxn00062_c0:rxn00548_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8cd9a0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8cd9a0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:3.000000000000044; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msmodelutl:Ac.NO/rxn00062_c0:rxn00548_c0< needed:-1.2360762807809203e-15 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media targe

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8cdfd0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8cdfd0>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.9999999999999845; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn05759_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn17

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn13974_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn17445_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn02480_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03085_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msmodelutl:H2.Ac/rxn00062_c0:rxn03079_c0< needed:9.980469446314597e-17 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:H2.Ac/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INF

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn05759_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn17445_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn02480_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03085_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn03079_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn05759_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24606_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24607_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msmodelutl:Methanol.H2/rxn00062_c0:rxn24606_c0> needed:-8.524664005385659e-17 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol.H2/rxn00062_c0:rxn24607_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol.H2/rxn00062_c0:rxn03127_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol.H2/rxn00062_c0:rxn15961_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol.H2/rxn00062_c0:rxn03126_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8c0c10>, 'target': 'rxn00062_c0', 'minobjec

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: EX_cpd00425_e0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24608_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24609_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24610_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24611_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn33011_c0
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:EX_cpd00425_e0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:rxn24608_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:rxn24609_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:rxn24610_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodel

Running Independent gapfilling!
{<modelseedpy.core.msmedia.MSMedia object at 0x35f8ee940>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee940>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn09269_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee610>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee610>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35f8f5c70>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8f5c70>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False, 'new': {}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35f957df0>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f957df0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn0307

INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:Expansion time:empty:0.5549575000000004
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:0.06121912500000093
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.O2:0.052439375000002286
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Etho.O2:0.05264170899999954
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.O2:0.05216758399999932
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Glyc.O2:0.06603549999999814
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Fum.O2


Model built: Test3
  Reactions: 1562
  Metabolites: 1365
  Genes: 1121
  Class: Gram Negative
  Core GF: 0
Tests: [{'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8c03a0>, 'is_max_threshold': True, 'threshold': 1e-05, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f96f820>, 'is_max_threshold': True, 'threshold': 31.799999999999795, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f96fc70>, 'is_max_threshold': True, 'threshold': 7.800000000002388, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f96ffd0>, 'is_max_threshold': True, 'threshold': 14.700000000000031, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee5e0>, 'is_max_threshold': True, 'threshold': 12.450000000000026, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35f8ee880>, 'is_max_threshold': True, 'threshold': 18.15000000

INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.6255695356616103; min objective:0.01


Full model: 9141
Gapfilling count: 8963
Reaction list: 9141


INFO:modelseedpy.core.msmodelutl:Expansion time:empty:180.66161037499998
INFO:modelseedpy.core.msmodelutl:Filtered count:246 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:36.43695441699998
INFO:modelseedpy.core.msmodelutl:Filtered count:274 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.O2:27.558596457999954
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Etho.O2:0.4690147500000421
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.O2:0.406324541999993
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Glyc.O2:0.6383890420000284
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Fum.O2:0.5312957499999698
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodel

Running Sequential gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.263284507947727; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Expansion time:empty:0.27410695799994755
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:0.18652787500002432
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.O2:0.13601170799995543
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Etho.O2:1.0028333340000017
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.O2:0.14810995799996363
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Glyc.O2:0.14197891599997092
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Fum.O2:0.13544308299992736
INFO:mod

  GS GF: 70
  Growth: Carbon-Pyruvic-Acid:0.19774639777748967
  Saved: /Users/chenry/Dropbox/Projects/KBDatalakeApps/notebooks/data/pipeline_test/models/Test3_model.json


## Step 6b: Build All Metabolic Models (Parallel)

Tests `pipeline_run_moddeling_analysis()`.

Builds metabolic models for **all** genomes using ProcessPoolExecutor,
matching the production pipeline behavior. Uses `Annotation:SSO` column
from genome TSV files as the source of functional annotations.

- **Input**: All genome TSV files in `<pipeline_dir>/genomes/` with `Annotation:SSO`
- **Output**: COBRA JSON models in `<pipeline_dir>/models/`
- **Uses**: `worker_count` parallel processes

In [3]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    parameters=config['parameters'],
    kb_version=config.get('kb_version', 'dev'),
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_run_moddeling_analysis()

# Inspect output
import cobra
models_dir = os.path.join(config['pipeline_dir'], 'models')
if os.path.exists(models_dir):
    model_files = [f for f in os.listdir(models_dir) if f.endswith('_model.json')]
    print(f'\nModels built ({len(model_files)}):')
    for mf in sorted(model_files):
        model = cobra.io.load_json_model(os.path.join(models_dir, mf))
        print(f'  {mf}: {len(model.reactions)} rxns, {len(model.metabolites)} mets, {len(model.genes)} genes')
else:
    print('No models directory created')

2026-02-03 23:02:52,410 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-03 23:02:52,411 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-03 23:02:52,411 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2026-02-03 23:02:52,412 - __main__.NotebookUtil - INFO - Notebook name: test_pipeline_steps
2026-02-03 23:02:52,412 - __main__.NotebookUtil - INFO - Notebook environment detected
2026-02-03 23:02:52,413 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - Using directly provided configuration dictionary
2026-02-03 23:02:52,413 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


/Users/chenry/Dropbox/Projects/KBUtilLib/src


2026-02-03 23:02:52,781 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - CRITICAL - KBase version not set up for modeling!
2026-02-03 23:02:52,782 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - SKANI database cache: /Users/chenry/.kbutillib/skani_databases.json
  Via conda: conda install -c bioconda skani
  Via cargo: cargo install skani
  From source: https://github.com/bluenote-1577/skani
  Or set 'skani.executable' in config.yaml to the full path



Building 2 models with 4 workers
modelseedpy 0.4.2modelseedpy
 0.4.2


2026-02-03 23:02:57,838 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-03 23:02:57,838 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-03 23:02:57,838 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2026-02-03 23:02:57,839 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-03 23:02:57,839 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-03 23:02:57,839 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


2026-02-03 23:03:02,593 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
2026-02-03 23:03:02,594 - KBDatalakeApps.KBDatalakeUtils.WorkerUtil - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


cobrakbase 0.4.0
cobrakbase 0.4.0


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


N
N


INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd11632 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd11632 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08701 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08701 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00187 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00187 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00425 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00425 not found in model!
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:22.625000000000107; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:22.625000000000107; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:22.624999999999943; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn09269_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msmodelutl:rxn00548_c0< not needed:15.124999999999998
INFO:modelseedpy.core.msmodelutl:Succ.O2/rxn00062_c0:rxn09269_c0> needed:6.049928856858541e-15 with min obj:2
INFO:modelseedpy.core.msmodelutl:rxn00173_c0> not needed:15.124999999999984
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3623a0>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn09269_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3623a0>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_che

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01241_c0
INFO:modelseedpy.core.msmodelutl:Akg.O2/rxn00062_c0:rxn01241_c0> needed:1.1866810160945163e-15 with min obj:2
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c344760>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c344760>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01241_c0
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:3.374999999999912; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Akg.O2/rxn00062_c0:rxn01241_c0> needed:-1.6507085360670924e-15 with min obj:2
INFO:modelseedpy.core.msgapfill:Cumulative med

Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.3333333333333362; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360eadc40>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360eadc40>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.3333333333333362; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn13974_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
IN

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn13974_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn03127_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn03020_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn06299_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn00548_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.3333333333333255; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn17445_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn02480_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn03085_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Ac/rxn00062_c0:rxn15961_c0> needed:0.0 with min 

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.5833333333333328; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01187_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn01241_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msmodelutl:Akg/rxn00062_c0:rxn01187_c0> needed:-1.9418751515428538e-15 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Akg/rxn00062_c0:rxn01241_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:rxn00173_c0< not needed:0.5000000000000023
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3732b0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn01187_c0': '>', 'rxn01241_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3732b0>, 'target': 'rxn00

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c373d30>, 'target': 'rxn00062_c0', 'minobjective': 1.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c373d30>, 'target': 'rxn00062_c0', 'minobjective': 1.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.0000000000000016; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ec73d0>, 'target': 'rxn00062_c0', 'minobjective': 1.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ec73d0>, 'target': 'rxn00062_c0', 'minobjective': 1.5,

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn40505_c0
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn03079_c0< needed:-1.7049328010771313e-16 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:16.874999999999826; min objective:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn13974_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn06299_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn15962_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn00548_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn17445_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn02480_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.NO/rxn00062_c0:rxn030

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ec7040>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ec7040>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:4.999999999999935; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c34ca30>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 'binary_check': False, 'new': {}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c34ca30>, 'target': 'rxn00062_c0', 'minobjective': 2.5, 

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msmodelutl:Ac.NO2/rxn00062_c0:rxn00548_c0< needed:-2.1311489515851118e-16 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c34c820>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c34c820>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:3.000000000000044; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msmodelutl:Ac.NO2/rxn00062_c0:rxn00548_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media targ

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.6666666666666667; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Ac.NO/rxn00062_c0:rxn00548_c0< needed:-1.2360762807809203e-15 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ee1df0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ee1df0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn00548_c0': '<'}, 'reversed': {}}
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.666666666666667; min objective:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedp

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.9999999999999845; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03079_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn46184_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn05759_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn06299_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn17445_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn02480_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03085_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msmodelutl:H2.CO2/rxn00062_c0:rxn03079_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:H2.CO2/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn03079_c0< needed:1.7049328010771317e-16 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn46184_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn05759_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn03127_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn03020_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn06299_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn17445_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn02480_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn03085_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4.H2/rxn00062_c0:rxn15961_c0> needed:0.0 with min obj:0.01
I

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn03085_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn15961_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:For.SO4/rxn00062_c0:rxn03126_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msgapfill:Cumulative media target solution: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3773d0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn03079_c0': '<', 'rxn46184_c0': '<', 'rxn05759_c0': '<', 'rxn03127_c0': '>', 'rxn03020_c0': '>', 'rxn06299_c0': '>', 'rxn17445_c0': '>', 'rxn02480_c0': '<', 'rxn03085_c0': '>', 'rxn15961_c0': '>', 'rxn03126_c0': '>'}, 'reversed': {}}
INFO:modelseedpy.core.msmodelutl:Adding gapfilling:{'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3773d0>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn03079_c0': '<', 'rxn46184_c0': '<', 'rx

Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24606_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24607_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn13974_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03020_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15962_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00548_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn00173_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn40505_c0
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.5000000000000028; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Methanol/rxn00062_c0:rxn24606_c0> needed:-1.781678342637051e-16 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol/rxn00062_c0:rxn24607_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol/rxn00062_c0:rxn13974_c0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Methanol/rxn00062_c0:rxn03020_c0< needed:0.0 with min obj:0.01


Running Independent gapfilling!
Running Independent gapfilling!
Running Independent gapfilling!


INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:1.0; min objective:0.01
INFO:modelseedpy.core.msgapfill:integrating rxn: EX_cpd00425_e0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24608_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24609_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24610_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24611_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn33011_c0
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:EX_cpd00425_e0< needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:rxn24608_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimethylamine.H2/rxn00062_c0:rxn24609_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Dimeth

Running Independent gapfilling!
Running Independent gapfilling!
{<modelseedpy.core.msmedia.MSMedia object at 0x35c3623a0>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3623a0>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn09269_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35c344760>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c344760>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35c344df0>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c344df0>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False, 'new': {}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x35c344700>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c344700>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary

INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24608_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24609_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24610_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03127_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24613_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn09318_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24612_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn15961_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn03126_c0
INFO:modelseedpy.core.msgapfill:integrating rxn: rxn24611_c0
INFO:modelseedpy.core.msmodelutl:Trimethylamine.H2/rxn00062_c0:rxn24608_c0> needed:1.7048646020319228e-16 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Trimethylamine.H2/rxn00062_c0:rxn24609_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Trimethylamine.H2/rxn00062_c0:rxn24610_c0> needed:0.0 with min obj:0.01
INFO:modelseedpy.core.msmodelutl:Trimethylamine.H2/rxn00

{<modelseedpy.core.msmedia.MSMedia object at 0x360ed7220>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ed7220>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn09269_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x360ed7940>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ed7940>, 'target': 'rxn00062_c0', 'minobjective': 2, 'binary_check': False, 'new': {'rxn01241_c0': '>'}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x360eadc40>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360eadc40>, 'target': 'rxn00062_c0', 'minobjective': 1.875, 'binary_check': False, 'new': {}, 'reversed': {}}, <modelseedpy.core.msmedia.MSMedia object at 0x360ead730>: {'growth': 0, 'media': <modelseedpy.core.msmedia.MSMedia object at 0x360ead730>, 'target': 'rxn00062_c0', 'minobjective': 0.01, 'binary_check': False, 'new': {'rxn03079_c0': '<', 'rxn46184_c0': '<', 

INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:Expansion time:empty:1.2614477919999985
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1803
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:0.0548230839999988
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1803
INFO:modelseedpy.core.msmodelutl:Expansion time:empty:0.5576221660000016
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.O2:0.049891708000000534
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1803
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:0.05908333400000032
INFO:modelseedpy.core.msmodelutl:Filtered count:6 out of 1805
INFO:modelseedpy.core.msmodelutl:Expansion time:Etho.O2:0.04824504099999

Tests: [{'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c377a00>, 'is_max_threshold': True, 'threshold': 1e-05, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c362d90>, 'is_max_threshold': True, 'threshold': 31.799999999999834, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c362520>, 'is_max_threshold': True, 'threshold': 7.800000000000007, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c362a90>, 'is_max_threshold': True, 'threshold': 14.700000000000031, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c362b20>, 'is_max_threshold': True, 'threshold': 12.450000000000019, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c362340>, 'is_max_threshold': True, 'threshold': 18.15000000000002, 'objective': 'rxn00062_c0'}, {'media': <modelseedpy.core.msmedia.MSMedia object at 0x35c3625b0>, 'is

INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.6255695356616043; min objective:0.01
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.6255695356616103; min objective:0.01


Full model: 9141
Gapfilling count: 8963
Reaction list: 9141
Full model: 9141
Gapfilling count: 8963
Reaction list: 9141


INFO:modelseedpy.core.msmodelutl:Expansion time:empty:182.314142291
INFO:modelseedpy.core.msmodelutl:Filtered count:246 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:empty:186.566114542
INFO:modelseedpy.core.msmodelutl:Filtered count:246 out of 12811
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:36.18496645800002
INFO:modelseedpy.core.msmodelutl:Filtered count:274 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.O2:36.35391416599998
INFO:modelseedpy.core.msmodelutl:Filtered count:274 out of 12811
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.O2:26.329450624999993
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Etho.O2:0.45365308300000606
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.O2:1.4875171250000108
INFO:modelseedpy.core.msmodelutl:Filtered count:286 out of 12810
INFO:modelseedpy.core.msmodelutl:Expansi

Running Sequential gapfilling!


INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.NO3:0.33096550000004754
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.DMSO:0.7577864170000339
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.TMAO:0.5609472499999129
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.26559443706640035; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.DMSO:0.42634687500003565
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.TMAO:0.45112829200002125
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.SO4:0.43279629099993144
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 2021
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.SO3:0.368

Running Sequential gapfilling!


INFO:modelseedpy.core.msmodelutl:Expansion time:For.NO3:0.13837279200004104
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO2:0.17292391599994517
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO3:0.9896081670000285
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.fbapkg.gapfillingpkg:Objective with gapfill database:0.263284507947727; min objective:0.01
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO:0.1909221250000428
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.NO3:0.14793379200000345
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.DMSO:0.1869772909999483
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 75
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.TMAO:0.14129208300005303
INF

[1/2] Test4: 1635 rxns, 1100 genes, class=Gram Negative


INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO2:1.0221457090000285
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO3:0.14681595799993374
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.NO:0.1707878749999736
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Ac.NO3:0.14039229100001194
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.DMSO:0.1859407499999861
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Glc.TMAO:0.14479104200006532
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Pyr.DMSO:0.14405454099994586
INFO:modelseedpy.core.msmodelutl:Filtered count:0 out of 70
INFO:modelseedpy.core.msmodelutl:Expansion time:Py

[2/2] Test3: 1632 rxns, 1121 genes, class=Gram Negative

Models built (2):
  Test3_model.json: 1632 rxns, 1384 mets, 1121 genes
  Test4_model.json: 1635 rxns, 1390 mets, 1100 genes


## Step 7: Run Phenotype Simulations

Tests `pipeline_run_phenotype_simulations()`.

Runs phenotype simulations for all built models using ProcessPoolExecutor.
Also builds summary tables from the individual simulation results.

- **Input**: COBRA JSON models in `<pipeline_dir>/models/`
- **Output**: Per-model JSON in `<pipeline_dir>/phenotypes/` and summary TSV tables

In [1]:
%run util.py
import os

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    parameters=config['parameters'],
    kb_version=config.get('kb_version', 'dev'),
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_run_phenotype_simulations()

# Inspect output
phenotypes_dir = os.path.join(config['pipeline_dir'], 'phenotypes')
if os.path.exists(phenotypes_dir):
    files = os.listdir(phenotypes_dir)
    json_count = len([f for f in files if f.endswith('.json')])
    tsv_count = len([f for f in files if f.endswith('.tsv')])
    print(f'\nPhenotype results: {json_count} simulation JSONs, {tsv_count} summary TSVs')
    for f in sorted(files):
        if f.endswith('.tsv'):
            df = pd.read_csv(os.path.join(phenotypes_dir, f), sep='\t')
            print(f'\n  {f}: {len(df)} rows')
            display(df.head())
else:
    print('No phenotypes directory created')

/Users/chenry/Dropbox/Projects/KBUtilLib/src


2026-02-04 09:53:18,128 - __main__.NotebookUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-04 09:53:18,128 - __main__.NotebookUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-04 09:53:18,129 - __main__.NotebookUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2026-02-04 09:53:18,129 - __main__.NotebookUtil - INFO - Notebook name: test_pipeline_steps
2026-02-04 09:53:18,130 - __main__.NotebookUtil - INFO - Notebook environment detected


modelseedpy 0.4.2


2026-02-04 09:53:18,384 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - Using directly provided configuration dictionary


loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


2026-02-04 09:53:23,405 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
2026-02-04 09:53:23,788 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - CRITICAL - KBase version not set up for modeling!
2026-02-04 09:53:23,789 - KBDatalakeApps.KBDatalakeUtils.KBDataLakeUtils - INFO - SKANI database cache: /Users/chenry/.kbutillib/skani_databases.json
  Via conda: conda install -c bioconda skani
  Via cargo: cargo install skani
  From source: https://github.com/bluenote-1577/skani
  Or set 'skani.executable' in config.yaml to the full path


cobrakbase 0.4.0

Running phenotype simulations for 2 models with 4 workers
modelseedpy 0.4.2
modelseedpy 0.4.2


2026-02-04 09:53:25,676 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-04 09:53:25,676 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-04 09:53:25,676 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token
2026-02-04 09:53:25,698 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded configuration from: /Users/chenry/.kbutillib/config.yaml
2026-02-04 09:53:25,699 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded 0 tokens from /Users/chenry/.tokens
2026-02-04 09:53:25,699 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - Loaded kbase tokens from /Users/chenry/.kbase/token


loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
loading biochemistry database from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


2026-02-04 09:53:30,578 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase
2026-02-04 09:53:30,622 - KBDatalakeApps.KBDatalakeUtils.PhenotypeWorkerUtil - INFO - ModelSEED database loaded from /Users/chenry/Dropbox/Projects/ModelSEEDDatabase


cobrakbase 0.4.0
cobrakbase 0.4.0


INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd11632 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08701 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00187 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00425 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd11632 not found in model!
INFO:modelseedpy.core.msatpcorrection:max_gapfilling: 10, best_score: 0.0
INFO:modelseedpy.core.msmodelutl:cpd08701 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00187 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd00425 not found in model!
INFO:modelseedpy.core.msatpcorrection:max_gapfilling: 10, best_score: 0.0
INFO:modelseedpy.core.msmodelutl:cpd08021 not found in model!
INFO:modelseedpy.core.msmodelutl:cpd08021 not 

Added 0 exchanges to model Test3_gf
Added 0 exchanges to model Test4_gf
cpd00020 optimal 3.9549279555494654
cpd00020 optimal 4.20828271865244
Added 0 exchanges to model Test3_gf
cpd00023 optimal 5.4230482328477905
Added 0 exchanges to model Test4_gf
cpd00023 optimal 5.576942939883526
Added 0 exchanges to model Test3_gf
cpd00024 optimal 5.775737151920804
Added 0 exchanges to model Test4_gf
cpd00024 optimal 5.929015378555213
Added 0 exchanges to model Test3_gf
cpd00027 optimal 7.347574651168088
Added 0 exchanges to model Test4_gf
cpd00027 optimal 7.440400196132978
Added 0 exchanges to model Test3_gf
cpd00029 optimal 2.477786189018972
Added 0 exchanges to model Test4_gf
cpd00029 optimal 2.6365144743364666
[1/2] Test3: simulated 0 phenotypes
[2/2] Test4: simulated 0 phenotypes
Built phenotype summary tables in /Users/chenry/Dropbox/Projects/KBDatalakeApps/notebooks/data/pipeline_test/phenotypes

Phenotype results: 4 simulation JSONs, 2 summary TSVs

  gapfilled_reactions.tsv: 2847 rows


Unnamed: 0,genome_id,reaction_id,reaction_name,gpr,lower_bound,upper_bound
0,Test4,rxn02201_c0,"2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydrop...",b3177 or b0142,0.0,1000.0
1,Test4,rxn00351_c0,gamma-L-glutamyl-L-cysteine:glycine ligase (AD...,b2947,0.0,1000.0
2,Test4,rxn00836_c0,IMP:diphosphate phospho-D-ribosyltransferase [c0],b0125,-1000.0,0.0
3,Test4,rxn02209_c0,"(R)-Propane-1,2-diol:NAD+ oxidoreductase [c0]",b2799,-1000.0,1000.0
4,Test4,rxn05318_c0,TRANS-RXN-203.ce [c0],b2964,-1000.0,1000.0



  genome_accuracy.tsv: 2 rows


Unnamed: 0,genome_id
0,Test4
1,Test3


## Step 8: Build SQLite Database

Tests `pipeline_build_sqllite_db()`.

Compiles all output data into a single SQLite database.

- **Input**: All TSV output files from previous pipeline steps
- **Output**: `<pipeline_dir>/berdl_tables.db` with tables:
  genome, genome_ani, genome_features, genome_accuracy,
  genome_gene_phenotype_reactions, genome_phenotype_gaps, gapfilled_reactions

In [None]:
%run util.py
import os
import sqlite3

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_build_sqllite_db()

# Inspect the database
db_path = os.path.join(config['pipeline_dir'], 'berdl_tables.db')
if os.path.exists(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = [row[0] for row in cursor.fetchall()]
    print(f'\nSQLite database: {db_path}')
    print(f'Tables ({len(tables)}):')
    for table in tables:
        cursor.execute(f'SELECT COUNT(*) FROM [{table}]')
        count = cursor.fetchone()[0]
        print(f'  {table}: {count} rows')
        sample = pd.read_sql_query(f'SELECT * FROM [{table}] LIMIT 3', conn)
        display(sample)
    conn.close()
else:
    print(f'Database not found: {db_path}')

## Step 9: Save Annotated Genomes to KBase

Tests `pipeline_save_annotated_genomes()`.

Saves RAST-annotated genomes back to the KBase workspace and creates a GenomeSet.

- **Input**: Genome TSV files with `Annotation:SSO`, user_genomes.tsv for refs
- **Output**: New genome objects + GenomeSet in KBase workspace
- **Warning**: This writes to the KBase workspace - use a test workspace

In [None]:
%run util.py

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_save_annotated_genomes()

## Step 10: Save Models to KBase

Tests `pipeline_save_models_to_kbase()`.

Saves built COBRA metabolic models to the KBase workspace.

- **Input**: COBRA JSON models in `<pipeline_dir>/models/`, user_genomes.tsv
- **Output**: FBAModel objects in KBase workspace
- **Warning**: This writes to the KBase workspace - use a test workspace

In [None]:
%run util.py

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_save_models_to_kbase()

## Step 11: Generate KBase Report

Tests `pipeline_save_kbase_report()`.

Generates an HTML report summarizing pipeline results and saves it to KBase.

- **Input**: All pipeline output data (genomes, models, SQLite DB)
- **Output**: KBase report object with HTML viewer and downloadable SQLite DB
- **Warning**: This writes to the KBase workspace - use a test workspace

In [None]:
%run util.py

config = util.load('pipeline_config')
pipeline = util.create_pipeline_utils(
    directory=config['pipeline_dir'],
    workspace_name=config['workspace_name'],
    parameters=config['parameters'],
    worker_count=config['worker_count'],
)

# Run the actual pipeline method
pipeline.pipeline_save_kbase_report()

if hasattr(pipeline, 'report_name'):
    print(f'\nReport: {pipeline.report_name} ({pipeline.report_ref})')

## Inspection: Review All Pipeline Outputs

Comprehensive view of all outputs across all pipeline steps.
No `KBDataLakeUtils` needed - just reads the filesystem.

In [None]:
%run util.py
import os
import sqlite3

config = util.load('pipeline_config')
pipeline_dir = config['pipeline_dir']

print('=' * 60)
print('PIPELINE OUTPUT SUMMARY')
print('=' * 60)

dirs_to_check = [
    ('user_genomes.tsv', 'User Genomes Table'),
    ('assemblies', 'Assemblies'),
    ('genomes', 'Genome Gene Tables'),
    ('skani', 'SKANI Results'),
    ('models', 'Metabolic Models'),
    ('phenotypes', 'Phenotype Simulations'),
    ('berdl_tables.db', 'SQLite Database'),
]

for item, label in dirs_to_check:
    full_path = os.path.join(pipeline_dir, item)
    if os.path.isfile(full_path):
        size_kb = os.path.getsize(full_path) / 1024
        print(f'\n{label}: {full_path} ({size_kb:.1f} KB)')
        if full_path.endswith('.tsv'):
            df = pd.read_csv(full_path, sep='\t')
            print(f'  Rows: {len(df)}, Columns: {list(df.columns)}')
    elif os.path.isdir(full_path):
        files = os.listdir(full_path)
        print(f'\n{label}: {full_path} ({len(files)} files)')
        for f in sorted(files):
            fp = os.path.join(full_path, f)
            size_kb = os.path.getsize(fp) / 1024
            if f.endswith('.tsv'):
                row_count = len(pd.read_csv(fp, sep='\t'))
                print(f'  {f}: {row_count} rows ({size_kb:.1f} KB)')
            else:
                print(f'  {f}: {size_kb:.1f} KB')
    else:
        print(f'\n{label}: NOT FOUND')

# SQLite summary
db_path = os.path.join(pipeline_dir, 'berdl_tables.db')
if os.path.exists(db_path):
    print(f'\n{"=" * 60}')
    print('SQLite Database Tables:')
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    for row in cursor.fetchall():
        cursor.execute(f'SELECT COUNT(*) FROM [{row[0]}]')
        print(f'  {row[0]}: {cursor.fetchone()[0]} rows')
    conn.close()