# Dataset creation for PINN training

In this notebook, we create the datasets that are necessary to train the component PINNs. The following steps mimic `generate_dataset()` in `pinnsim.dataset_functions.dataset_generation` which can be called directly with a config file as we show at the end of this notebook. The simulated datasets will be stored in the `data.learning_data` folder.

### Imports

In [1]:
import torch

from pinnsim.dataset_functions.dataset_generation import sample_dataset
from pinnsim.dataset_functions.dataset_generation import generate_dataset
from pinnsim.configurations.dataset_config import define_dataset_config


from pinnsim.numerics import PredictorODE
from pinnsim.power_system_models import VoltageProfilePolynomial, GeneratorModel

Could not find GLIMDA.


#### Dataset configurations

To retain a trackable account of how the datasets are generated, we specify dataset config files. This include the id of the generator, the type of dataset and optionally a seed for the random processes. In `define_dataset_config`, the sampling approaches of the time variable $t$, the initial condition $x_0$ and the voltage parametrisations $\Xi$ are set.

In [2]:
dataset_config = define_dataset_config(dataset_type="train", generator_id="ieee9_1", seed=192444352)

### Sampling process

Based on the config file (we adjust the number of points to keep the example dataset small), `sample_dataset` will randomly sample all relevant input variables according to the specified sampling strategies.

In [3]:
dataset_config["n_operating_points"] = 100
dataset = sample_dataset(dataset_config=dataset_config)

### Dataset simulation

The above sampling is quick to execute, the following step is more time consuming as we need to simulate the trajectory for each of the data points and retain the value at the given time $t$. To this end, we use an ODE solver (see `PredictorODE`) which takes a component model (`GeneratorModel`) and a voltage profile as inputs. The latter is used to transform the voltage parametrisation values $\Xi$ into a complex voltage $\bar{v}(t) = V(t) \exp(j\theta(t))$ given a time value $t$. For later use, we also store the values of $V$ and $\theta$ along with the state result $x(t)$ in the dataset.

In [4]:
generator = GeneratorModel(
        generator_config=dataset["generator_config"]
    )
voltage_profile = VoltageProfilePolynomial(order_polynomial=2)

simulator = PredictorODE(component=generator, voltage_profile=voltage_profile)

time_extended = torch.hstack([dataset["time"] * 0.0, dataset["time"]])


# actual simulation
with torch.no_grad():
    state_results = torch.vstack(
        [
            simulator.predict_state(
                time=time_extended[ii : ii + 1, :].reshape((-1, 1)),
                state=dataset["state_initial"][ii : ii + 1, :],
                control_input=dataset["control_input"][ii : ii + 1, :],
                voltage_parametrisation=dataset["voltage_parametrisation"][
                    ii : ii + 1, :
                ],
            )[1:, :]
            for ii in range(dataset["time"].shape[0])
        ]
    )

theta, V = voltage_profile.get_voltage(
    time=dataset["time"], voltage_parametrisation=dataset["voltage_parametrisation"]
) 

assert state_results.shape == dataset["state_initial"].shape

dataset.update(
        {
            "state_result": state_results,
            "theta_result": theta,
            "V_result": V,
        }
    )

print(dataset["state_result"])

tensor([[ 1.0560e+00,  0.0000e+00, -2.4483e+00,  1.4912e-02],
        [ 1.0560e+00,  0.0000e+00, -7.5898e-01, -9.4167e-03],
        [ 1.0560e+00,  0.0000e+00, -1.3451e+00,  9.0891e-03],
        [ 1.0560e+00,  0.0000e+00, -6.0950e-01,  6.2388e-03],
        [ 1.0560e+00,  0.0000e+00,  1.1190e+00,  1.1468e-03],
        [ 1.0560e+00,  0.0000e+00, -2.3844e+00,  9.5353e-03],
        [ 1.0560e+00,  0.0000e+00, -2.1311e-02,  3.8575e-03],
        [ 1.0560e+00,  0.0000e+00,  1.0136e+00, -4.3222e-03],
        [ 1.0560e+00,  0.0000e+00,  3.0993e+00, -4.4377e-03],
        [ 1.0560e+00,  0.0000e+00, -1.5956e+00, -9.5151e-03],
        [ 1.0560e+00,  0.0000e+00, -2.1884e+00, -5.9126e-03],
        [ 1.0560e+00,  0.0000e+00,  2.6418e+00, -4.4578e-03],
        [ 1.0560e+00,  0.0000e+00,  8.9996e-01,  2.9204e-04],
        [ 1.0560e+00,  0.0000e+00, -5.4588e-03, -8.3675e-03],
        [ 1.0560e+00,  0.0000e+00, -2.0531e+00,  6.6959e-03],
        [ 1.0560e+00,  0.0000e+00, -3.2084e+00,  4.2540e-03],
        

For the collocation dataset we do not need to run the simulation as we will not require the values of `state_result`. Hence, we simply set those to 0 and update the result values of $V$ and $\theta$ as they are cheap to evaluate.

In [5]:
dataset_config_collocation = define_dataset_config(dataset_type="collocation", generator_id="ieee9_1", seed=5648431)
dataset_collocation = sample_dataset(dataset_config=dataset_config_collocation)
theta_collocation, V_collocation = voltage_profile.get_voltage(
    dataset_collocation["time"], dataset_collocation["voltage_parametrisation"]
)
dataset_collocation.update(
    {
                "state_result": torch.zeros(dataset_collocation["state_initial"].shape),
                "theta_result": theta_collocation,
                "V_result": V_collocation,
    }
)

To save any dataset call `pinnsim.dataset_functions.dataset_handling.save_dataset_raw()` and specify the path. As a default we suggest the `data` folder in the project.

## Datasets needed for training (can take a while ~20 min)

The following cell can be run if the various datasets should be simulated for training the PINNs that are needed to run PINNSim on the IEEE 9-bus system. They can be found readily simulated in `data.learning_data`. 

In [6]:
import itertools

from pinnsim import LEARNING_DATA_PATH

seeds = torch.randint(
    low=100000,
    high=100000000,
    size=(9,),
    generator=torch.Generator().manual_seed(21643131),
).tolist()

for seed, (generator_id, dataset_type) in zip(
    seeds,
    itertools.product(
        ["ieee9_1", "ieee9_2", "ieee9_3"], ["train", "test", "collocation"]
    ),
):
    dataset_config = define_dataset_config(
        dataset_type=dataset_type, generator_id=generator_id, seed=seed
    )
    generate_dataset(dataset_config, data_path=LEARNING_DATA_PATH)


Saved dataset "train_ieee9_1".
Created and saved dataset train_ieee9_1 in 108.64 s.
Saved dataset "test_ieee9_1".
Created and saved dataset test_ieee9_1 in 179.96 s.
Saved dataset "collocation_ieee9_1".
Created and saved dataset collocation_ieee9_1 in 0.09 s.
Saved dataset "train_ieee9_2".
Created and saved dataset train_ieee9_2 in 148.06 s.
Saved dataset "test_ieee9_2".
Created and saved dataset test_ieee9_2 in 235.21 s.
Saved dataset "collocation_ieee9_2".
Created and saved dataset collocation_ieee9_2 in 0.07 s.
Saved dataset "train_ieee9_3".
Created and saved dataset train_ieee9_3 in 165.99 s.
Saved dataset "test_ieee9_3".
Created and saved dataset test_ieee9_3 in 264.15 s.
Saved dataset "collocation_ieee9_3".
Created and saved dataset collocation_ieee9_3 in 0.07 s.
