# Sampling Genome-Scale Models

In this notebook we demonstrate a few tricks to to sampling genome-scale models defined in SBML with maximum efficiency.
To keep the compute times minimal in this demo, we demonstrate the required API calls using e_coli_core.

Note, that we have used code similar to this to sample Recon3D.

In [1]:
import hopsy
from PolyRound.api import PolyRoundApi
import os
import time
import numpy as np

# Load model

In [2]:
model_path = os.path.join("test_data", "e_coli_core.xml")
polytope = PolyRoundApi.sbml_to_polytope(model_path)
# Note: gurobi is used here. Academic licenses are available for free. If you don't have gurobi, the fallback is glpk.

Set parameter Username
Academic license - for non-commercial use only - expires 2025-03-22


In [3]:
print(polytope.A.shape)
print(polytope.b.shape)
# print(r_polytope.A.shape)
# print(r_polytope.b.shape)

(190, 95)
(190,)


# Generate hopsy problem 
Generate the problem object from polytope definition & preprocess by bounding and rounding the polytope.
We add the bounds to ensure the uniform distribution is well-defined on the polytope and we round to increase sampling efficiency.

In [4]:
problem = hopsy.Problem(polytope.A, polytope.b)
problem = hopsy.add_box_constraints(problem, upper_bound=10_000, lower_bound=-10_000, simplify=True)
problem = hopsy.round(problem)

In [6]:
print(polytope.A.shape)
print(polytope.b.shape)
print(problem.A.shape)
print(problem.b.shape)

(190, 95)
(190,)
(190, 95)
(190,)


# Setup markov chains and random number generators
We require to manually specify a seed, because it improves awareness of the seed. Specifically, the seed is required for scientific reproducibility.

In [7]:
seed = 511
chains, rngs = hopsy.setup(problem, seed)
n_samples = 1_00
# Either use thinning rule, see  10.1371/journal.pcbi.1011378
# or use one-shot transformation (for expert users). We show one-shot transformation at the end.
thinning = int(1./6*problem.transformation.shape[1])

In [8]:
start = time.perf_counter()
accrate, samples = hopsy.sample(chains, rngs, n_samples, thinning=thinning)
print("sampling with internal trafo took", time.perf_counter()-start,"seconds")
# accrate is 1 for uniform samples with the default chains given by hopsy.setup()

sampling with internal trafo took 0.10038722099852748 seconds


# Evaluate sampe quality
For highest statistical quality, it is advised to check rhat < 1.01 and ess / n_chains > 100 

In [9]:
rhat = np.max(hopsy.rhat(samples))
print("rhat:", rhat)
ess = np.min(hopsy.ess(samples)) / len(chains)
print("ess:", ess)

rhat: 1.1020489881077324
ess: 7.377708203148892


# Expert mode: one shot backtransform
By postponing the back transformation from rounded space to original space, we can obtain better performance for high-dimensional models

In [10]:
assert problem.transformation is not None

In [19]:
# transformation = problem.transformation
# shift = problem.shift
problem.transformation=None
problem.shift=None
seed = 512
chains, rngs = hopsy.setup(problem, seed)
n_samples = 10_0
# thinning is still advised when hard drive memory is limisted to not to store too many samples 
thinning = int(1*problem.A.shape[1])  
start = time.perf_counter()
accrate, sample_stack = hopsy.sample(chains, rngs, n_samples, thinning=thinning)
print("sampling took", time.perf_counter()-start,"seconds")
# accrate is 1 for uniform samples with the default chains given by hopsy.setup()

print('sample shape', sample_stack.shape)
rhat = np.max(hopsy.rhat(sample_stack))
print("rhat:", rhat)
ess = np.min(hopsy.ess(sample_stack)) / len(chains)
print("ess:", ess)

# transform samples back all at once
shift_t = np.array([shift]).T
start_trafo = time.perf_counter()
full_samples = np.zeros((len(chains), n_samples, sample_stack.shape[2]))
for i in range(len(chains)):
    # full_samples[i] = (transformation@sample_stack[i].T).T + np.tile(shift_t, (1, n_samples)).T
    full_samples[i] =  np.tile(shift_t, (1, n_samples)).T
    
print("transformation took", time.perf_counter()-start_trafo,"seconds")

print(full_samples)
rhat = np.max(hopsy.rhat(full_samples))
print("rhat:", rhat)
ess = np.min(hopsy.ess(full_samples)) / len(chains)
print("ess:", ess)

sampling took 0.05042209000021103 seconds
sample shape (20, 100, 95)
rhat: 1.0074546102111943
ess: 91.55345979324146
transformation took 0.0003334030006953981 seconds
[[[ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  [ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  [ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  ...
  [ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  [ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  [ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]]

 [[ 5.44511812e-310  5.44512026e-310  2.77386669e-001 ...
   -6.25897007e-001 -1.89802509e-001  3.79442416e-321]
  [ 5.44511812e-310  5.44512026e-3

  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
