# Simulators #

## Hand-crafted Simulator - One Entity ##

Here we will demonstrate a simple simulator with one entity.

The entity will have a single random variable with three possible values, each with equal probability. We create a Compiled Knowledge PGM for the entity. Don't worry too much about how the PGM is created as most Synthorus use-cases will automatically create needed PGMs.

In [1]:
from ck.pgm_compiler import DEFAULT_PGM_COMPILER
from ck.pgm_circuit.wmc_program import WMCProgram
from ck.pgm import PGM

pgm = PGM()
patient_age = pgm.new_rv('patient_age', ('young', 'middle_aged', 'old'))
pgm.new_factor(patient_age).set_dense().set_uniform()
wmc = WMCProgram(DEFAULT_PGM_COMPILER(pgm))

We create a Simulator object and add an entity with the created PGM, and give the enitiy one field, which is the random variable

In [2]:
from synthorus.simulator.pgm_sim_sampler import PGMSimSampler
from synthorus.simulator.simulator import Simulator

sim = Simulator()
patient = sim.add_entity('patient', sampler=PGMSimSampler(wmc))
patient.add_field_sampled(field_name='age', rv_name='patient_age')

<synthorus.simulator.sim_field.SimField at 0x2a62ed6deb0>

Before running the simulator, we need somewhere to send the records. We will just use a `DebugRecorder` which sends records to stdout.

In [3]:
from synthorus.simulator.sim_recorder import DebugRecorder

sim_recorder = DebugRecorder()

Now we can run the simulator...

In [4]:
sim.run(sim_recorder)

Entity: patient ['_id_', '_count_', 'age']

patient [('_id_', 1), ('_count_', 1), ('age', 'old')]

Finished


We can run the simulator for multiple iterations...

In [5]:
sim.run(sim_recorder, iterations=5)

Entity: patient ['_id_', '_count_', 'age']

patient [('_id_', 1), ('_count_', 1), ('age', 'young')]
patient [('_id_', 2), ('_count_', 1), ('age', 'old')]
patient [('_id_', 3), ('_count_', 1), ('age', 'young')]
patient [('_id_', 4), ('_count_', 1), ('age', 'middle_aged')]
patient [('_id_', 5), ('_count_', 1), ('age', 'old')]

Finished


The "patient" entity is a root entity, so has an implicit parent entity with each iteration of the run. The default cardinality of a parent-child relationship is one-to-one. That is why the "_count_" field of each record is 1.

To show this more clearly, and to demonstrate simulator parameters, we will add a parameter called "number_of_patients" and use it to control the cardinality of the patient entity.

In [6]:
number_of_patients = sim.add_parameter('number_of_patients', 3)
patient.add_cardinality_variable_count(number_of_patients)

Now running the simulator will show three patients per iteration.

In [7]:
sim.run(sim_recorder)

Entity: patient ['_id_', '_count_', 'age']

patient [('_id_', 1), ('_count_', 1), ('age', 'young')]
patient [('_id_', 2), ('_count_', 2), ('age', 'young')]
patient [('_id_', 3), ('_count_', 3), ('age', 'middle_aged')]

Finished


In [8]:
sim.run(sim_recorder, iterations=2)

Entity: patient ['_id_', '_count_', 'age']

patient [('_id_', 1), ('_count_', 1), ('age', 'old')]
patient [('_id_', 2), ('_count_', 2), ('age', 'young')]
patient [('_id_', 3), ('_count_', 3), ('age', 'middle_aged')]
patient [('_id_', 4), ('_count_', 1), ('age', 'middle_aged')]
patient [('_id_', 5), ('_count_', 2), ('age', 'old')]
patient [('_id_', 6), ('_count_', 3), ('age', 'old')]

Finished


Note that the value of a simulator parameter can be modified.

In [9]:
number_of_patients.value = 4
sim.run(sim_recorder)

Entity: patient ['_id_', '_count_', 'age']

patient [('_id_', 1), ('_count_', 1), ('age', 'old')]
patient [('_id_', 2), ('_count_', 2), ('age', 'old')]
patient [('_id_', 3), ('_count_', 3), ('age', 'middle_aged')]
patient [('_id_', 4), ('_count_', 4), ('age', 'old')]

Finished


## Hand-crafted Simulator - Two Entities ##

In this demonstration, we will handcraft a simulator with two entities.

The first entity is "patient" with random variable "age" that is uniformly distributed across three possible values.

The second entity is "event" with random variables: "type", "duration", and "duration_since_last". Entity "event" will be a child of "parent". The value of "type" will depend on patient age, and the other two random variables will be distributed uniformly.

We will create a PGM that can be used by both entities. Once again, don't worry about the details of the PGM.

In [10]:
pgm = PGM()

patient_age = pgm.new_rv('patient_age', ('young', 'middle_aged', 'old'))
event_type = pgm.new_rv('event_type', ('ED', 'AP', 'GP', 'DEATH'))
event_duration = pgm.new_rv('event_duration', range(1, 10))
event_duration_since_last = pgm.new_rv('event_duration_since_last', range(1, 10))

pgm.new_factor(patient_age).set_dense().set_uniform()
pgm.new_factor(event_type, patient_age).set_cpt().set(
    # patient_age  ED,   AP,  GP,  DEATH
    ((0,),         (0.0, 0.0, 1.0, 0.0)),  # young
    ((1,),         (0.5, 0.0, 0.5, 0.0)),  # middle_aged
    ((2,),         (0.3, 0.3, 0.3, 0.1)),  # old
)
pgm.new_factor(event_duration).set_dense().set_uniform()
pgm.new_factor(event_duration_since_last).set_dense().set_uniform()

wmc = WMCProgram(DEFAULT_PGM_COMPILER(pgm))

Now create a simulator, with one parameter.

In [11]:
sim = Simulator()
number_of_patients = sim.add_parameter('number_of_patients', 2)

Here are the two samplers, one for each entity. They both use the same PGM, but the "event" sampler will be conditioned on the "age" field (which will be in the "patient" entity) using the PGM random variable "patient_age" for conditioning the PGM.

In [12]:
patient_sampler = PGMSimSampler(wmc)
event_sampler = PGMSimSampler(wmc, conditions={patient_age: 'age'})  # rv = patient_age, field = age

Entity "patient" has one field, "age". Its cardinality will be controlled by the simulator parameter "number_of_patients".

In [13]:
patient = sim.add_entity('patient', sampler=patient_sampler)

patient.add_field_sampled(field_name='age', rv_name='patient_age')

patient.add_cardinality_variable_count(number_of_patients)

Entity "event" has three fields and the stopping condition is an event type of "DEATH".

In [14]:
event = sim.add_entity('event', parent=patient, foreign_field_name='_parent_id', sampler=event_sampler)

field_event_type = event.add_field_sampled('type', 'event_type')
field_duration = event.add_field_sampled('duration', 'event_duration')
field_duration_since_last = event.add_field_sampled('duration_since_last', 'event_duration_since_last')

event.add_cardinality_field_state(field_event_type, 'DEATH')

Running the current simulator with this configuration may take a long time as each patient may have many events. We could add another stopping condition to the event entity that limits the number of event records per patient. For example...
```
event.add_cardinality_fixed_limit(10)
```

However, we can add a field to track the timing of each event. Field "time" will be initialised to zero prior to the first record being created for each patient. It will be updated for each record by summing "time" (from the previous event record), "duration" and "duration_since_last".

In [15]:
from synthorus.simulator.sim_field_updaters import SumUpdate

time_update = SumUpdate(field_duration, field_duration_since_last, include_self=True)  # include 'time' from the previous event record.
field_time = event.add_field('time', value=0, update=time_update)

Now we add a stopping condition for when event time exceeds parameter "time_limit".

In [16]:
time_limit = sim.add_parameter('time_limit', 50)
event.add_cardinality_variable_limit(field_time, time_limit)

Here we run the simulator with the current state of parameters.

In [17]:
for name, field in sim.parameters.items():
    print(name, '=', field.value)
print()

sim.run(sim_recorder)

number_of_patients = 2
time_limit = 50

Entity: patient ['_id_', '_count_', 'age']
Entity: event ['_id_', '_count_', '_parent_id', 'type', 'duration', 'duration_since_last', 'time']

patient [('_id_', 1), ('_count_', 1), ('age', 'old')]
event [('_id_', 1), ('_count_', 1), ('_parent_id', 1), ('type', 'GP'), ('duration', 9), ('duration_since_last', 7), ('time', 16)]
event [('_id_', 2), ('_count_', 2), ('_parent_id', 1), ('type', 'AP'), ('duration', 7), ('duration_since_last', 5), ('time', 28)]
event [('_id_', 3), ('_count_', 3), ('_parent_id', 1), ('type', 'GP'), ('duration', 6), ('duration_since_last', 4), ('time', 38)]
event [('_id_', 4), ('_count_', 4), ('_parent_id', 1), ('type', 'GP'), ('duration', 5), ('duration_since_last', 3), ('time', 46)]
event [('_id_', 5), ('_count_', 5), ('_parent_id', 1), ('type', 'DEATH'), ('duration', 5), ('duration_since_last', 9), ('time', 60)]
patient [('_id_', 2), ('_count_', 2), ('age', 'young')]
event [('_id_', 6), ('_count_', 1), ('_parent_id', 2),

## Using a Simulator JSON Specification ##

Synthorus uses Pydantic objects and JSON to serialise simulator specifications. (This does not include PGM objects.)

The `synthorus_demos` package include a module to create a Pydantic specification of the above simulation and samplers.

In [18]:
from synthorus.simulator.sim_entity import SimSampler
from typing import Dict
from synthorus_demos.simulator import example_simulator_spec
from synthorus.simulator.simulator_spec import SimulatorSpec

simulator_spec: SimulatorSpec = example_simulator_spec.make_simulator_spec()
samplers:  Dict[str, SimSampler] = example_simulator_spec.make_samplers()

Here is a JSON serialisation of the simulator specification.

Notice that entity samplers are referenced by name in the JSON, which is why the demod samplers are provided in a dictionary from name to entity sampler.

In [19]:
print(simulator_spec.model_dump_json(indent=2))

{
  "parameters": {
    "number_of_patients": 10,
    "time_limit": 100
  },
  "entities": {
    "patient": {
      "parent": null,
      "sampler": "patient_sampler",
      "id_field_name": "_id_",
      "count_field_name": "_count_",
      "foreign_field_name": null,
      "fields": {
        "age": {
          "type": "sample",
          "rv_name": "patient_age"
        },
        "in_database": {
          "type": "constant",
          "value": true
        },
        "decade": {
          "type": "function",
          "initial_value": 0,
          "inputs": [
            "age"
          ],
          "function": "int(age / 10) + 1"
        }
      },
      "cardinality": [
        {
          "type": "variable",
          "field": "_count_",
          "op": ">=",
          "limit_field": "number_of_patients"
        }
      ]
    },
    "event": {
      "parent": "patient",
      "sampler": "event_sampler",
      "id_field_name": "_id_",
      "count_field_name": "_count_",
      "

And here is the dictionary of samplers.

In [20]:
for name, sampler in samplers.items():
    print(name, sampler)

patient_sampler <synthorus.simulator.pgm_sim_sampler.PGMSimSampler object at 0x000002A62EDB0CB0>
event_sampler <synthorus.simulator.pgm_sim_sampler.PGMSimSampler object at 0x000002A62EDD8B30>


A helper function `simulator_from_spec` will build a simulator from it's specification and a dictionary of samplers.

In [21]:
from synthorus.simulator.make_simulator_from_simulator_spec import make_simulator_from_simulator_spec

sim: Simulator = make_simulator_from_simulator_spec(simulator_spec, samplers)


Here is a run of the simulator.

In [22]:
sim.parameters['number_of_patients'].value = 2
sim.parameters['time_limit'].value = 50

sim.run(sim_recorder)

Entity: patient ['_id_', '_count_', 'age', 'in_database', 'decade']
Entity: event ['_id_', '_count_', '_patient__id_', 'type', 'duration', 'duration_since_last', 'time']

patient [('_id_', 1), ('_count_', 1), ('age', 57), ('in_database', True), ('decade', 6)]
event [('_id_', 1), ('_count_', 1), ('_patient__id_', 1), ('type', 'GP'), ('duration', 8), ('duration_since_last', 2), ('time', 10)]
event [('_id_', 2), ('_count_', 2), ('_patient__id_', 1), ('type', 'GP'), ('duration', 9), ('duration_since_last', 3), ('time', 22)]
event [('_id_', 3), ('_count_', 3), ('_patient__id_', 1), ('type', 'GP'), ('duration', 9), ('duration_since_last', 1), ('time', 32)]
event [('_id_', 4), ('_count_', 4), ('_patient__id_', 1), ('type', 'ED'), ('duration', 8), ('duration_since_last', 1), ('time', 41)]
event [('_id_', 5), ('_count_', 5), ('_patient__id_', 1), ('type', 'GP'), ('duration', 2), ('duration_since_last', 3), ('time', 46)]
event [('_id_', 6), ('_count_', 6), ('_patient__id_', 1), ('type', 'GP'), (