# ModelSpec #

This section explains the `ModelSpec` class.


In Synthorus, a `ModelSpec` is a Pydantic object for defining a synthetic data system. It defines things like:
- model metadata
- privacy protection settings
- reference datasources (`DatasourceSpec` objects, each with a `DatasetSpec` object)
- random variables (`ModelRVSpec` objects)
- cross-tables (`ModelCrosstabSpec` objects, defining subsets of random variables)
- entities (`ModelEntitySpec` objects)
- custom parameters and their values.

A `ModelSpec` object can be used for creating cross-tables, reports, and simulators.

In practice, a synthetic data engineer will not create a `ModelSpec` object manually. A `ModelSpec` object will usually be created either from: (a) a "spec file" which is a simplified file format, or (b) JSON from a previously  serialized `ModelSpec` object.

The following example shows a manually created model spec, using an example datasource provided by `synthorus_semos`.

Here is the example datasource.

In [1]:
from synthorus.model.datasource_spec import DatasourceSpec
from synthorus_demos.dataset import example_datasource
from synthorus.dataset import Dataset

# Create a demo datasource spec
datasource_acx: DatasourceSpec = example_datasource.make_datasource_spec_acx()

# Load the dataset so we can show the random variable possible states
dataset_acx: Dataset = datasource_acx.dataset()

datasource_name: str = 'datasource_acx'

# Show the random variables and there possible values.
for rv_name in datasource_acx.rvs:
    print(f'{rv_name}: {sorted(dataset_acx.value_set(rv_name))}')

A: ['n', 'y']
C: ['n', 'y']
X: ['n', 'y']


To build a model spec, we need to define the random variables of the system. This is done using a
dictionary of `ModelRVSpec` objects.

For this example we only have one datasource, with three random variable, all with the same possible states.
We create the dictionary of random variables using a simple loop.

In [2]:
from synthorus.model.model_spec import ModelRVSpec

# Create random variable specs - all have the same states
rvs = {
    rv_name: ModelRVSpec(states=['n', 'y'])
    for rv_name in ['A', 'C', 'X']
}

We need at least one cross-table defined. This is done using a `ModelCrosstabSpec` spec.

In [3]:
from synthorus.model.model_spec import ModelCrosstabSpec

# Define a cross-table (over all random variables)
crosstab = ModelCrosstabSpec(rvs=['A', 'C', 'X'], datasource=datasource_name)

crosstab_name: str = 'crosstab_acx'

We need at least one entity defined. This is done using a `ModelEntitySpec` spec.

In [4]:
from synthorus_demos.model.example_model_spec import sample_rvs
from synthorus.model.model_spec import ModelEntitySpec

# Define an entity (over all random variables)
entity = ModelEntitySpec(fields=sample_rvs('A', 'C', 'X'))

entity_name: str = 'entity_acx'

Now we can put it all together in a `ModelSpec` object.

In [5]:
from synthorus.model.model_spec import ModelSpec

model_spec = ModelSpec(
    datasources={datasource_name: datasource_acx},
    rvs=rvs,
    crosstabs={crosstab_name: crosstab},
    entities={entity_name: entity},
)

Here is a JSON representation of our model.

In [6]:
print(model_spec.model_dump_json(indent=2))


{
  "name": "_unknown_",
  "author": "_unknown_",
  "comment": "",
  "roots": [],
  "rng_n": 4,
  "pgm_crosstabs": "noisy",
  "datasources": {
    "datasource_acx": {
      "sensitivity": 1.0,
      "rvs": [
        "A",
        "C",
        "X"
      ],
      "dataset_spec": {
        "type": "csv",
        "weight": null,
        "rv_map": null,
        "rv_define": {},
        "input": {
          "type": "inline",
          "inline": "A,C,X\ny,n,n\nn,n,y\ny,n,y\ny,y,y\ny,y,n\nn,y,y\ny,y,y\nn,n,y\nn,n,n\ny,n,n\n"
        },
        "sep": ",",
        "header": true,
        "skip_blank_lines": true,
        "skip_initial_space": false
      },
      "non_distribution_rvs": []
    }
  },
  "rvs": {
    "A": {
      "states": [
        "n",
        "y"
      ],
      "ensure_none": false
    },
    "C": {
      "states": [
        "n",
        "y"
      ],
      "ensure_none": false
    },
    "X": {
      "states": [
        "n",
        "y"
      ],
      "ensure_none": false
    }