# Guide for simulating with AlphaPEM

## Purpose
This notebook shows two ways to run AlphaPEM simulations:

1) The Python API `run_alphapem_from_df` for small interactive runs
2) The parallel CLI script `scripts/run_sampler_batch_parallel.py` for larger runs

You will:
- Run three configurations and see how errors are handled
- Create a Latin LHS design with 20 samples and save it in `data/designs`
- Launch a parallel batch for 10 configurations and inspect the outputs

> Note: Run this notebook from the repository root inside `notebooks/` so the relative paths match!

In [None]:
from pathlib import Path
import os, sys, json
import numpy as np
import pandas as pd
from scipy.stats import qmc  # For LHS
import subprocess, shlex 

In [5]:
# Relative path to repo root
parent = Path("..")

# Make src importable
if str(parent.resolve()) not in sys.path:
    sys.path.append(str(parent))

# Paths (lowercase names)
alpha_pem_root        = str(parent / "external" / "AlphaPEM")
param_config_yaml     = str(parent / "configs" / "param_config.yaml")
simulator_defaults_yaml = str(parent / "configs" / "simulator_defaults.yaml")
raw_dir               = parent / "data" / "raw"
designs_dir           = parent / "data" / "designs"

# Ensure directories exist
raw_dir.mkdir(parents=True, exist_ok=True)
designs_dir.mkdir(parents=True, exist_ok=True)

# Imports from src
from src.sampling.sampler import run_alphapem_from_df
from src.sampling.bounds import load_param_config, validate_sample_against_spec, SampleValidationError

# Optional: short path display
print("Parameter configurations:", param_config_yaml)
print("Simulator defaults (other fixed params):", simulator_defaults_yaml)
print("AlphaPEM root:", alpha_pem_root)
print("Raw directory", raw_dir)
print("Design directory", designs_dir)

Parameter configurations: ..\configs\param_config.yaml
Simulator defaults (other fixed params): ..\configs\simulator_defaults.yaml
AlphaPEM root: ..\external\AlphaPEM
Raw directory ..\data\raw
Design directory ..\data\designs


## Prerequisites

- The AlphaPEM code is present in `external/AlphaPEM` and importable.
- Your YAML files exist at `configs/param_config.yaml` and `configs/simulator_defaults.yaml`

If you use a virtual environment, activate it before starting the Jupyter server

## Controlling the simulation conditions and parameter ranges

The file `configs/param_config.yaml` is the single source of truth for how we define inputs, ranges, types, fixed values, and derived relationships among parameters. Every simulator call in this project uses it to load and validate the parameter space before running AlphaPEM. Concretely:

* `load_param_config(path)` gives `param_config.yaml` into a `SamplingSpec` that contains names, types, numeric bounds for continuous and integer parameters, allowed categories for categoricals, fixed values and derived expressions.
* `validate_sample_against_spec(sample, spec)` checks that a sample adheres to bounds and types, enforces fixed parameters and first applies any derived expressions declared in the YAML. If something is wrong it raises a `SampleValidationError`.
* Example of a derived expression: `Pc_des` can be derived as `Pa_des - 20000`. This gets computed automatically during validation as long as your YAML declares it as fixed with a `derived` expression.

### What must be in `param_config.yaml`?

The YAML must list the 19 AlphaPEM input features we use in this project:

1. The 7 operating conditions
   `Tfc, Pa_des, Pc_des, Sa, Sc, Phi_a_des, Phi_c_des`

2. The undetermined physical parameters used for sensitivity analysis
   `epsilon_gdl, tau, epsilon_mc, epsilon_c, e, Re, i0_c_ref, kappa_co, kappa_c`

3. Project‑specific fixed physical constants we keep fixed in runs (for example, `a_slim, b_slim, a_switch`)

For any feature whose sensitivity you want to explore, provide:

* `type` one of `continuous`, `integer`, or `categorical`
* a numeric range `low` and `high` for continuous or integer types
* or `values` for categorical types

You can also fix parameters directly in the YAML:

* Example: fix `Sa` to 1.3 across all samples
* Example: derive `Pc_des` from `Pa_des` using `derived: "Pa_des - 20000"`

A minimal YAML skeleton looks like this:

```yaml
parameters:
  # -------- Operating conditions --------
  - name: Tfc
    type: continuous
    low: 333
    high: 363
    fixed: false

  - name: Pc_des
    type: continuous
    fixed: true
    derived: "Pa_des - 20000"   # derived rule applied by bounds.py

  - name: Sa
    type: continuous
    fixed: true
    value: 1.3                   # example of fixing Sa for all runs
```

The samplers and runners fill missing fixed values, compute derived values, and enforce bounds using this spec before any simulation starts.

This is how the current <code>param_config</code> looks like:

In [6]:
# Load the parameter specification
spec = load_param_config(param_config_yaml)

print("--------------------------------------------")
print("Non-fixed parameters (variables)")
print("--------------------------------------------")
for name in spec.names:
    p = spec.spec_index[name]
    if p["type"] in ("continuous", "integer"):
        print(f"  {name:15} | type={p['type']:10} | low={p['low']} | high={p['high']}")
    elif p["type"] == "categorical":
        print(f"  {name:15} | type=categorical | values={p['values']}")

print("--------------------------------------------")
print("Fixed parameters ")
print("--------------------------------------------")
for name, p in spec.spec_index.items():
    if p.get("fixed", False):
        if "value" in p:
            print(f"  {name:15} | fixed value={p['value']}")
        elif "derived" in p:
            print(f"  {name:15} | derived: {p['derived']}")

print(f"\nTotal non-fixed variables for sensitivity analysis: {len(spec.names)}")
print(f"Bounds array shape: {spec.bounds.shape}")


--------------------------------------------
Non-fixed parameters (variables)
--------------------------------------------
  Tfc             | type=continuous | low=333 | high=363
  Pa_des          | type=continuous | low=130000.0 | high=300000.0
  Sc              | type=continuous | low=1.1 | high=3.0
  Phi_c_des       | type=continuous | low=0.1 | high=0.7
  epsilon_gdl     | type=continuous | low=0.55 | high=0.8
  tau             | type=continuous | low=1.0 | high=4.0
  epsilon_mc      | type=continuous | low=0.15 | high=0.4
  epsilon_c       | type=continuous | low=0.15 | high=0.3
  e               | type=integer    | low=3 | high=5
  Re              | type=continuous | low=5e-07 | high=5e-06
  i0_c_ref        | type=continuous | low=0.001 | high=80
  kappa_co        | type=continuous | low=15 | high=40
  kappa_c         | type=continuous | low=0 | high=2.5
--------------------------------------------
Fixed parameters 
--------------------------------------------
  Pc_des          

Let's validate a sample that omits Pc_des but includes Pa_des.
Note that Pc_des will be computed from Pa_des if the YAML declares 'Pc_des' as fixed with derived: 'Pa_des - 20000'.

In [8]:

good_sample = {
    "Tfc": 340.0,
    "Pa_des": 230000.0,
    "Sa": 1.3,               # fixed in YAML in this example
    "Sc": 2.0,
    "Phi_a_des": 0.5,
    "Phi_c_des": 0.4,
    "epsilon_gdl": 0.62,
    "tau": 2.0,
    "epsilon_mc": 0.25,
    "epsilon_c": 0.22,
    "e": 4,                  # integer
    "Re": 1.0e-6,
    "i0_c_ref": 10.0,
    "kappa_co": 25.0,
    "kappa_c": 1.0,
    # Pc_des not provided on purpose, to be derived by validator!
}

aug = validate_sample_against_spec(good_sample, spec)
print("\nAugmented sample after validation and derived application:")
for k in sorted(aug.keys()):
    print(f"  {k}: {aug[k]}")


Augmented sample after validation and derived application:
  Pa_des: 230000.0
  Pc_des: 210000.0
  Phi_a_des: 0.5
  Phi_c_des: 0.4
  Re: 1e-06
  Sa: 1.3
  Sc: 2.0
  Tfc: 340.0
  a_slim: 0
  a_switch: 0.99
  b_slim: 1
  e: 4
  epsilon_c: 0.22
  epsilon_gdl: 0.62
  epsilon_mc: 0.25
  i0_c_ref: 10.0
  kappa_c: 1.0
  kappa_co: 25.0
  tau: 2.0


Let's now provoke a validation error. For that, we set Tfc out of bounds or set e to a non-integer.

In [9]:
bad_sample = dict(good_sample)
bad_sample["Tfc"] = 1000.0     # out of bounds on purpose

try:
    validate_sample_against_spec(bad_sample, spec)
except SampleValidationError as e:
    print("\nExpected validation error captured:")
    print(" ", e)

bad_sample = dict(good_sample)
bad_sample["e"] = 5.6   
try:
    validate_sample_against_spec(bad_sample, spec)
except SampleValidationError as e:
    print("\nExpected validation error captured:")
    print(" ", e)

bad_sample = dict(good_sample)
bad_sample["Sa"] = 1.4
try:
    validate_sample_against_spec(bad_sample, spec)
except SampleValidationError as e:
    print("\nExpected validation error captured:")
    print(" ", e)


Expected validation error captured:
  Tfc out of bounds [333.0, 363.0]: 1000.0

Expected validation error captured:
  e must be integer in [3, 5], got 5.6

Expected validation error captured:
  Sa must be fixed to 1.3, got 1.4.


## Minimal demo with three configurations

We will run three rows:
- One good configuration inside the bounds.
- Two faulty configurations with `Tfc` outside the allowed range to trigger validation errors.

We turn on `verify=True`, which enforces bounds and computes derived parameters strictly from the YAML (for example `Pc_des` from `Pa_des` if defined).

In [10]:
df_configs = pd.DataFrame([
    {
        "config_id": "good_cfg",
        "Tfc": 345.0,
        "Pa_des": 220000.0,
        "Sc": 2.2,
        "Phi_c_des": 0.3,
        "epsilon_gdl": 0.65,
        "tau": 2.5,
        "epsilon_mc": 0.25,
        "epsilon_c": 0.22,
        "e": 4,
        "Re": 1.2e-6,
        "i0_c_ref": 10.0,
        "kappa_co": 22.0,
        "kappa_c": 1.0
    },
    {
        "config_id": "bad_cfg_1",
        "Tfc": 349.0,  
        "Pa_des": 200000.0,
        "Sc": 2.0,
        "Phi_c_des": 0.5,
        "epsilon_gdl": 0.62,
        "tau": 2.0,
        "epsilon_mc": 0.30,
        "epsilon_c": 0.28,
        "e": 4.28,            # incorrect datatype   
        "Re": 1.0e-6,
        "i0_c_ref": 20.0,
        "kappa_co": 30.0,
        "kappa_c": 0.9
    },
    {
        "config_id": "bad_cfg_2",
        "Tfc": 600.0,  # out of bounds to show error
        "Pa_des": 260000.0,
        "Sc": 1.8,
        "Phi_c_des": 0.4,
        "epsilon_gdl": 0.60,
        "tau": 1.5,
        "epsilon_mc": 0.20,
        "epsilon_c": 0.18,
        "e": 3,
        "Re": 5.0e-7,
        "i0_c_ref": 5.0,
        "kappa_co": 18.0,
        "kappa_c": 0.7
    },
])

df_configs.insert(0, "index", range(len(df_configs)))
df_configs

Unnamed: 0,index,config_id,Tfc,Pa_des,Sc,Phi_c_des,epsilon_gdl,tau,epsilon_mc,epsilon_c,e,Re,i0_c_ref,kappa_co,kappa_c
0,0,good_cfg,345.0,220000.0,2.2,0.3,0.65,2.5,0.25,0.22,4.0,1.2e-06,10.0,22.0,1.0
1,1,bad_cfg_1,349.0,200000.0,2.0,0.5,0.62,2.0,0.3,0.28,4.28,1e-06,20.0,30.0,0.9
2,2,bad_cfg_2,600.0,260000.0,1.8,0.4,0.6,1.5,0.2,0.18,3.0,5e-07,5.0,18.0,0.7


In [11]:
results_df = run_alphapem_from_df(
    df_configs,
    alpha_pem_root=alpha_pem_root,
    simulator_defaults_yaml=simulator_defaults_yaml,
    param_config_yaml=param_config_yaml,
    verify=True,
    run_name="example_three_configs",
    output_dir=str(raw_dir),
    results_format="pkl",
    save_every=2,
    print_errors=True,
)

print("Rows:", len(results_df))
results_df.head(10)

[INFO] Running AlphaPEM on 3 configuration(s)...
[INFO] Directory for results → ..\data\raw\example_three_configs_simulations.pkl
[INFO] Directory for errors   → ..\data\raw\example_three_configs_sim_errors.csv
[ERROR] Could not simulate index=1, config_id=bad_cfg_1
   Error: e must be integer in [3, 5], got 4.28
[INFO] Checkpoint: 2/3 → ..\data\raw\example_three_configs_simulations.pkl
[ERROR] Could not simulate index=2, config_id=bad_cfg_2
   Error: Tfc out of bounds [333.0, 363.0]: 600.0
[INFO] Saved results: ..\data\raw\example_three_configs_simulations.pkl
[INFO] Logged 2 error(s): ..\data\raw\example_three_configs_sim_errors.csv
Rows: 3


Unnamed: 0,index,config_id,Tfc,Pa_des,Sc,Phi_c_des,epsilon_gdl,tau,epsilon_mc,epsilon_c,...,Ucell_22,Ucell_23,Ucell_24,Ucell_25,Ucell_26,Ucell_27,Ucell_28,Ucell_29,Ucell_30,Ucell_31
0,0,good_cfg,345.0,220000.0,2.2,0.3,0.65,2.5,0.25,0.22,...,0.637603,0.628134,0.618733,0.609388,0.600087,0.590818,0.581571,0.572334,0.563099,0.553856
1,1,bad_cfg_1,349.0,200000.0,2.0,0.5,0.62,2.0,0.3,0.28,...,,,,,,,,,,
2,2,bad_cfg_2,600.0,260000.0,1.8,0.4,0.6,1.5,0.2,0.18,...,,,,,,,,,,


After the run you will have:
- `data/raw/notebook_three_configs_simulations.pkl` with the results table. It includes the raw `ifc` and `Ucell` arrays plus the expanded `ifc_1..ifc_31` and `Ucell_1..Ucell_31` columns.
- `data/raw/notebook_three_configs_sim_errors.csv` listing any rows that failed validation or raised runtime errors.

You should see two errors for the two faulty rows and one successful simulation.

In [12]:
# Inspect the logged errors
err_path = raw_dir / "notebook_three_configs_sim_errors.csv"
if err_path.exists():
    display(pd.read_csv(err_path))
else:
    print("No errors file found at", err_path)

No errors file found at ..\data\raw\notebook_three_configs_sim_errors.csv


## Create a Latin LHS design with 20 samples

We will load the parameter spec from `param_config.yaml`, generate a Latin Hypercube sample for the non-fixed parameters, round integer parameters, and then save the design to both PKL and CSV under `data/designs/`

In [13]:
spec = load_param_config(param_config_yaml)
var_names = spec.names
bounds = spec.bounds
types = spec.types
d = len(var_names)

# Number of samples
n_samples = 20


sampler = qmc.LatinHypercube(d=d, seed=123)
lhs_unit = sampler.random(n=n_samples)

# Scale to bounds
samples = bounds[:, 0] + lhs_unit * (bounds[:, 1] - bounds[:, 0])

# Round integer parameters
for j, t in enumerate(types):
    if t == "integer":
        samples[:, j] = np.round(samples[:, j]).astype(int)

df_design = pd.DataFrame(samples, columns=var_names)
df_design.insert(0, "index", range(len(df_design)))
df_design.insert(1, "config_id", [f"lhs20_{i:03d}" for i in range(len(df_design))])

design_base = "lhs20_demo"
pkl_path = designs_dir / f"{design_base}.pkl"
csv_path = designs_dir / f"{design_base}.csv"
df_design.to_pickle(pkl_path)
df_design.to_csv(csv_path, index=False)

print(pkl_path)
print(csv_path)

print("------------------------------------------------------------------")
print("Design matrix with", n_samples, "configurations to be simulated")
print("------------------------------------------------------------------")
df_design.head()

..\data\designs\lhs20_demo.pkl
..\data\designs\lhs20_demo.csv
------------------------------------------------------------------
Design matrix with 20 configurations to be simulated
------------------------------------------------------------------


Unnamed: 0,index,config_id,Tfc,Pa_des,Sc,Phi_c_des,epsilon_gdl,tau,epsilon_mc,epsilon_c,e,Re,i0_c_ref,kappa_co,kappa_c
0,0,lhs20_000,348.476472,155042.52134,1.364066,0.364469,0.722801,1.328186,0.150958,0.290426,4.0,4e-06,21.948844,19.693794,1.39697
1,1,lhs20_001,358.179356,242697.530056,2.845156,0.432178,0.647101,3.580131,0.281023,0.223263,4.0,4e-06,9.669981,36.019578,1.998138
2,2,lhs20_002,335.2933,166309.93171,2.532733,0.381234,0.663536,2.820296,0.184773,0.263504,3.0,3e-06,60.812066,32.668473,0.83757
3,3,lhs20_003,344.209437,137892.362144,2.279592,0.122863,0.652938,1.123955,0.321091,0.187391,4.0,4e-06,14.127573,17.340387,0.467805
4,4,lhs20_004,352.495228,296760.924144,1.900292,0.627181,0.589561,3.757526,0.384174,0.278917,4.0,5e-06,49.736022,15.7032,0.075481


## Run the parallel batch script for 10 configs

We will call `scripts/run_sampler_batch_parallel.py` from inside the notebook using `subprocess`. We take the 20-sample design we just saved and run only the *last* 10 rows with multiple workers.

This script:
- splits the DataFrame into worker chunks
- validates rows if `--verify` is used
- writes per-worker checkpoints under `data/raw/temp/<run_name>/`
- merges final results and errors
- writes a metadata JSON with run details

Note: If your environment does not have AlphaPEM or the YAML files, this cell will fail. That is expected outside the full project.

In [26]:
def format_command_pretty(args, base=Path("..")):
    out = ["python"]
    i = 1
    while i < len(args):
        part = args[i]
        if i == 1 or part.startswith("--") and i+1 < len(args) and not args[i+1].startswith("--"):
            # combine with value if exists
            if part.startswith("--") and i+1 < len(args) and not args[i+1].startswith("--"):
                val = args[i+1]
                try: val = str(Path(val).relative_to(base))
                except: pass
                out.append(f"{part} {val}")
                i += 2
                continue
        try: part = str(Path(part).relative_to(base))
        except: pass
        out.append(part)
        i += 1
    return "\n".join([out[0]] + [f"  {p} \\" for p in out[1:-1]] + [f"  {out[-1]}"])

In [28]:
# File to be executed
script_path = str(parent / "scripts" / "run_sampler_batch_parallel.py")

# Name of the run (saved in meta)
run_name = "notebook_parallel_demo"

args = [
    sys.executable, str(script_path),
    "--input", str(pkl_path),
    "--n_samples", "10",
    "--offset", "10",
    "--alpha_pem_root", alpha_pem_root,
    "--param_config_yaml", param_config_yaml,
    "--simulator_defaults_yaml", simulator_defaults_yaml,
    "--verify",
    "--n_workers", "4",
    "--save_every", "5",
    "--output_dir", str(raw_dir),
    "--run_name", run_name,
    "--format", "csv",
    "--print_errors",
]

print("Running with args:")
print(format_command_pretty(args, base=parent))

Running with args:
python
  scripts\run_sampler_batch_parallel.py \
  --input data\designs\lhs20_demo.pkl \
  --n_samples 10 \
  --offset 10 \
  --alpha_pem_root external\AlphaPEM \
  --param_config_yaml configs\param_config.yaml \
  --simulator_defaults_yaml configs\simulator_defaults.yaml \
  --verify \
  --n_workers 4 \
  --save_every 5 \
  --output_dir data\raw \
  --run_name notebook_parallel_demo \
  --format csv \
  --print_errors


In [25]:
result = subprocess.run(args, capture_output=True, text=True)
print("\nReturn code:", result.returncode)
print("\n--- STDOUT ---\n", result.stdout)
print("\n--- STDERR ---\n", result.stderr)


Return code: 0

--- STDOUT ---
 Parallel AlphaPEM: 10 rows | 4 worker chunk(s) | workers=4
Temp dir: ..\data\raw\temp\notebook_parallel_demo
Worker 3: done | ok=2, err=0 → worker_notebook_parallel_demo_core3.pkl
Chunk saved → worker_notebook_parallel_demo_core3.pkl (ok=2, err=0)
Worker 2: done | ok=2, err=0 → worker_notebook_parallel_demo_core2.pkl
Chunk saved → worker_notebook_parallel_demo_core2.pkl (ok=2, err=0)
Worker 0: done | ok=3, err=0 → worker_notebook_parallel_demo_core0.pkl
Chunk saved → worker_notebook_parallel_demo_core0.pkl (ok=3, err=0)
Worker 1: done | ok=3, err=0 → worker_notebook_parallel_demo_core1.pkl
Chunk saved → worker_notebook_parallel_demo_core1.pkl (ok=3, err=0)
Final results → ..\data\raw\notebook_parallel_demo_simulations.csv
Final errors  → ..\data\raw\notebook_parallel_demo_sim_errors.csv
Totals        → ok=10, err=0


--- STDERR ---
  return bound(*args, **kwds)



After the parallel run finishes you will find:
- `data/raw/notebook_parallel_demo_simulations.csv`
- `data/raw/notebook_parallel_demo_sim_errors.csv`
- `data/raw/notebook_parallel_demo_meta.json`

You can read the metadata to quickly review the run.

In [35]:
meta_path = raw_dir / f"{run_name}_meta.json"
if meta_path.exists():
    with open(meta_path, "r", encoding="utf-8") as f:
        meta = json.load(f)
        fields = ["status","n_subset","n_workers","ok","err","start_time","end_time","duration_seconds","results_path","errors_path","temp_dir"]
        print({k: meta.get(k) for k in fields})
else:
    print("Meta not found at", meta_path)

{'status': 'success', 'n_subset': 10, 'n_workers': 4, 'ok': 10, 'err': 0, 'start_time': '2025-08-11T10:00:59', 'end_time': '2025-08-11T10:07:50', 'duration_seconds': 410.904, 'results_path': 'C:\\Users\\User\\Documents\\0. Semestres\\SS2025\\Official-Sensitivity-Analysis-and-Surrogate-Modeling-of-PEM-Fuel-Cells\\data\\raw\\notebook_parallel_demo_simulations.csv', 'errors_path': 'C:\\Users\\User\\Documents\\0. Semestres\\SS2025\\Official-Sensitivity-Analysis-and-Surrogate-Modeling-of-PEM-Fuel-Cells\\data\\raw\\notebook_parallel_demo_sim_errors.csv', 'temp_dir': 'C:\\Users\\User\\Documents\\0. Semestres\\SS2025\\Official-Sensitivity-Analysis-and-Surrogate-Modeling-of-PEM-Fuel-Cells\\data\\raw\\temp\\notebook_parallel_demo'}


## Wrap up

We've shown how to:
- Use the Python API for quick iteration and debugging in a notebook
- Use the script for larger runs from designs in `data/designs`
- Keep the YAMLs under version control so runs are reproducible

If something fails, check the errors CSV and the metadata JSON for details. The temp folder under `data/raw/temp/<run_name>` is only kept when a run fails or is interrupted.