**For correct rendering, view this notebook in [nbviewer](https://nbviewer.org/github/markuskrecik/preference-dynamics-learning/blob/main/notebooks/10_data_generation.ipynb)**

# Synthetic Data Generation

This project solves the inverse problem for a nonlinear system of ordinary differential equations (ODEs) with neural networks. The goal in inverse problems is recovering the parameters of a system from observed data.

The system at hand describes the behavior of people through their desires and efforts ([Krecik, 2025a](https://doi.org/10.1007/s10614-025-10895-3), [Krecik, 2025b](https://www.ssrn.com/abstract=5303381)) given a set of input parameters $\mathbf{g}, \boldsymbol{\mu}, \Pi, \Gamma$:
$$
\mathbf{v}'(t) = \mathbf{g} - \Pi \cdot \mathbf{a}(t),\\
\mathbf{m}'(t) = \mathbf{u}(t) - \Gamma \cdot \mathbf{a}(t),
$$
where:
- $\mathbf{u}(t) = \max(\mathbf{v}(t), \boldsymbol{\mu})$ (desires),
- $\mathbf{a}(t) = \max(\mathbf{m}(t), \mathbf{0})$ (efforts, always ≥0).

Solving the inverse problem is generally a hard task with conventional means due to the high dimensionality of the parameter space, while neural networks are particularly well-suited for it.

**This notebook:**
1. Generates and saves a dataset of synthetic data for $n\in \{1,2,3\}$ actions
2. Gives a first glimpse at the generated data

In [None]:
# All notebooks assume the `Notebook File Root=${workspaceFolder}` setting (in vscode), or similar
%load_ext autoreload
%autoreload 2

from plotly.offline import init_notebook_mode

init_notebook_mode(connected=True)

from rich import print

from preference_dynamics.schemas import SolverConfig
from preference_dynamics.solver import create_default_sampler, generate_batch
from preference_dynamics.data import DataConfig, DataManager

## Data Generation

I generate a batch of time series samples by simulating the ODE.
`n_actions` determines the number of actions. The corresponding datasets will be saved in separate folders.
We can choose an IO handler to save in different formats.

The sampler draws random parameters and initial conditions from a (default: uniform) distribution, and checks through an Eigenvalue-based heuristic for stability of the ODE.

In [3]:
n_actions = 2  # <=3 for reasonable runtime
n_samples = 10000
seed = 42

data_dir = f"data/n{n_actions}"

sampler = create_default_sampler(n_actions=n_actions, random_seed=seed)

solver_config = SolverConfig(
    time_span=(0.0, 200.0),
    n_time_points=201,
)

print(f"Generating {n_samples} samples for n={n_actions} actions...")
batch = generate_batch(
    n_samples,
    sampler,
    solver_config,
    n_jobs=-1,
    show_progress=True,
    debug=False,
)

# DataManager handles consistent saving of raw and processed data, as well as transformations.
# Data is stored as json by default, but other IOHandlers can be chosen.
config = DataConfig(data_dir=data_dir, load_if_exists=False)
dm = DataManager(config=config)
dm.save_raw(batch)

Generating samples: 100%|██████████| 10000/10000 [09:25<00:00, 17.70it/s]


Let's glimpse under the hood. This is one time series sample we just generated:

In [4]:
sample = batch[0]

print(
    "Parameters and initial conditions generating the time series:\n\n",
    repr(sample.config.ode.parameters),
    "\n\n",
    repr(sample.config.ode.initial_conditions),
)
print("The created time series instance contains all information for reconstruction:\n", sample)

## Summary

We've successfully generated synthetic time series data for the preference dynamics ODE system.

**Future extensions:**
- Add noise to the time series

**Next steps** (see `20_data_exploration.ipynb`):
- Explore the time series visually
- Clean data and explore parameter distributions