# GillesPy2 -> Tellurium training dataset generation

In this notebook, we use StochNetV2's dataset generation process as the target for Tellurium's dataset generation process.

Since the complexity of StochNetV2's implementation is rather high, and at this stage we simply aim to be able to use Tellurium's stochastic simulation trajectories as input for StochNetV2 models, it is enough to somehow craft a bridge between the two instead of modifying StochNetV2's code to accept Tellurium's data format.

## StochNetV2's GillesPy2 generation

In [112]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [129]:
from stochnet_v2.dataset.simulation_gillespy import build_simulation_dataset
from stochnet_v2.utils.file_organisation import ProjectFileExplorer

import tellurium as te

# other
from pathlib import Path
import numpy as np
from importlib import import_module
import pandas as pd

In [166]:
# StochNetV2 dataset configuration
name = "SIR"
n_species = 3
params = ["beta", "gamma"]

model_name = name
nb_settings = len(params)
nb_trajectories = 2
timestep = 0.1
endtime = 1.0
dataset_id = name

project_folder = Path("").parent.resolve() / model_name
project_explorer = ProjectFileExplorer(project_folder)
dataset_explorer = project_explorer.get_dataset_file_explorer(timestep, dataset_id)

In [167]:
# Generate and save initial settings
CRN_module = import_module(model_name)
CRN_class = getattr(CRN_module, model_name)

settings = CRN_class.get_initial_settings(nb_settings)
np.save(dataset_explorer.settings_fp, settings)

In [172]:
# StochNetV2 dataset generation
dataset = build_simulation_dataset(
    model_name,
    nb_settings,
    nb_trajectories,
    timestep,
    endtime,
    dataset_explorer.dataset_folder,
    params_to_randomize=params,
    how="concat",
)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1280.31it/s]


In [173]:
print(f"shape: {dataset.shape}")
print("- 4: 2 initial settings (1 per parameter), 2 trajectories (1 per initial setting)")
print("- 11: time steps from 0.0 to 1.0 with step size 0.1")
print("- 6: time + 3 species + 2 parameters")

shape: (4, 11, 6)
- 4: 2 initial settings (1 per parameter), 2 trajectories (1 per initial setting)
- 11: time steps from 0.0 to 1.0 with step size 0.1
- 6: time + 3 species + 2 parameters


In [178]:
print("two initial settings, one trajectory for each (note the randomised parameter values):\n")
for block in dataset:
    df = pd.DataFrame(block, columns=["time", "S", "I", "R", "beta", "gamma"])
    print(df, "\n")

two initial settings, one trajectory for each (note the randomised parameter values):

    time      S      I      R      beta  gamma
0    0.0  173.0  106.0  117.0  2.607408    1.0
1    0.1  171.0  120.0  105.0  2.607408    1.0
2    0.2  166.0  142.0   88.0  2.607408    1.0
3    0.3  157.0  158.0   81.0  2.607408    1.0
4    0.4  144.0  173.0   79.0  2.607408    1.0
5    0.5  136.0  185.0   75.0  2.607408    1.0
6    0.6  131.0  195.0   70.0  2.607408    1.0
7    0.7  119.0  214.0   63.0  2.607408    1.0
8    0.8  113.0  225.0   58.0  2.607408    1.0
9    0.9  115.0  233.0   48.0  2.607408    1.0
10   1.0  107.0  241.0   48.0  2.607408    1.0 

    time      S      I      R      beta  gamma
0    0.0  173.0  106.0  117.0  2.607408    1.0
1    0.1  180.0  116.0  100.0  2.607408    1.0
2    0.2  171.0  133.0   92.0  2.607408    1.0
3    0.3  163.0  153.0   80.0  2.607408    1.0
4    0.4  158.0  164.0   74.0  2.607408    1.0
5    0.5  156.0  176.0   64.0  2.607408    1.0
6    0.6  151.0  1

## Tellurium's Gillespie implementation

In [324]:
# Load SIR model from SBML
model = te.loadSBMLModel("example.xml")

In [325]:
# Configure simulator
model.integrator = "gillespie"
model.integrator.seed = 42

In [330]:
def randomize_parameters(model, sigma=0.1, n=1):
    parameter_names = model.getGlobalParameterIds()
    parameter_values = model.getGlobalParameterValues()

    random_parameters = []
    num_parameters = len(parameter_names)

    for i in range(n):
        iteration_parameters = {}

        for j, (name, value) in enumerate(zip(parameter_names, parameter_values)):
            if i % num_parameters == j:
                shift = np.random.uniform(-sigma, sigma) * value
                iteration_parameters[name] = value + shift
            else:
                # keep the default value for other parameters
                iteration_parameters[name] = value

        random_parameters.append(iteration_parameters)

    return random_parameters


def randomize_species_concentrations(model, n=1):
    species_names = model.getFloatingSpeciesConcentrationIds()
    species_values = model.getFloatingSpeciesConcentrations()

    random_concentrations = []

    for _ in range(n):
        iteration_concentrations = {}

        for name, value in zip(species_names, species_values):
            low = max(0, int(value / 2))  # at least 0, since concentrations are (usually) whole numbers
            high = int(value * 2)
            iteration_concentrations[name] = np.random.randint(low, high)

        random_concentrations.append(iteration_concentrations)

    return random_concentrations

def assign_custom_values(model, value_dict):
    for prop_name, prop_value in value_dict.items():
        model[prop_name] = prop_value
        
    return model

In [331]:
# Simulation configuration to mimic StochNetV2's
initial_settings = 2
simulations_per_setting = 2
steps = 11
end_time = 1.0

In [332]:
results = []

# generate random initial concentrations and parameter variations
init_concentrations = randomize_species_concentrations(model, initial_settings)
randomized_parameters = randomize_parameters(model, 0.2, initial_settings)

results = []
for init_setting in range(initial_settings):
    for _ in range(simulations_per_setting):
        model.reset()
        model = assign_custom_values(model, init_concentrations[init_setting])
        model = assign_custom_values(model, randomized_parameters[init_setting])
        sim = model.simulate(0.0, end_time, steps)
            
        results.append(sim)

In [333]:
for i, block in enumerate(results):
    df = pd.DataFrame(block, columns=block.colnames)
    # Add the randomized parameters as new columns
    for param_name, param_value in randomized_parameters[i // simulations_per_setting].items():
        df[param_name] = param_value
    print(df, "\n")

    time    [I]    [R]   [S]     beta     gamma
0    0.0  159.0  184.0  59.0  3.23487  0.815101
1    0.1  149.0  200.0  53.0  3.23487  0.815101
2    0.2  134.0  217.0  51.0  3.23487  0.815101
3    0.3  131.0  230.0  41.0  3.23487  0.815101
4    0.4  129.0  237.0  36.0  3.23487  0.815101
5    0.5  124.0  247.0  31.0  3.23487  0.815101
6    0.6  120.0  254.0  28.0  3.23487  0.815101
7    0.7  112.0  262.0  28.0  3.23487  0.815101
8    0.8  104.0  272.0  26.0  3.23487  0.815101
9    0.9   97.0  279.0  26.0  3.23487  0.815101
10   1.0   94.0  283.0  25.0  3.23487  0.815101 

    time    [I]    [R]   [S]     beta     gamma
0    0.0  159.0  184.0  59.0  3.23487  0.815101
1    0.1  150.0  196.0  56.0  3.23487  0.815101
2    0.2  140.0  210.0  52.0  3.23487  0.815101
3    0.3  130.0  223.0  49.0  3.23487  0.815101
4    0.4  127.0  232.0  43.0  3.23487  0.815101
5    0.5  122.0  240.0  40.0  3.23487  0.815101
6    0.6  116.0  249.0  37.0  3.23487  0.815101
7    0.7  110.0  259.0  33.0  3.23487 