# Synthetic Data Generation

TD2C is a supervised method, which means it needs ground truth labels to learn. We generate this data using a library of **Multivariate Nonlinear Autoregressive (NAR)** processes.

Each process defines a specific way variables interact over time. For example, **Process 1** includes non-linear feedback loops:

$$ Y_{t+1}[j] = -0.4 \frac{3 - Y_t^2}{1 + Y_t^2} + \dots $$

In [None]:
from td2c.data_generation.builder import TSBuilder
import matplotlib.pyplot as plt

# Generate data from Process 1 (Non-linear)
builder = TSBuilder(
    n_variables=5, 
    observations_per_time_series=200,
    processes_to_use=[1], 
    noise_dist='gaussian',
    verbose=True
)
builder.build()

# Extract observations
obs_dict = builder.get_generated_observations()
time_series = obs_dict[1][0] # Process 1, Sample 0

# Visualize
plt.figure(figsize=(12, 6))
plt.plot(time_series[:100])
plt.title("Sample Multivariate Time Series (Process 1)")
plt.xlabel("Time ($t$)")
plt.ylabel("Value ($X_t$)")
plt.legend([f"Var {i}" for i in range(5)], loc='upper right')
plt.show()