# 01 – Generate SUHS-MRV UHS Dataset

This notebook triggers the SUHS-MRV (Synthetic Underground Hydrogen Storage – MRV) dataset generation
using the physics-based configuration in `config/uhs_config.yaml` and the generator in `src/generator.py`.

It assumes you have already created a Python environment and installed the dependencies from
`requirements.txt`.


## 0. Imports and paths

We configure the repository root and data directories, then import the generator function.


In [ ]:
from pathlib import Path

import pandas as pd

from src.generator import generate_uhs_dataset


In [ ]:
NOTEBOOK_DIR = Path.cwd()
REPO_ROOT = NOTEBOOK_DIR.parent
DATA_DIR = REPO_ROOT / "data" / "generated"

print("Notebook dir :", NOTEBOOK_DIR)
print("Repo root    :", REPO_ROOT)
print("Data dir     :", DATA_DIR)


## 1. Generate the dataset

This calls `generate_uhs_dataset()` which:

1. Loads configuration from `config/uhs_config.yaml`.
2. Builds the time index and facility metadata.
3. Runs the physics-based simulation for each facility.
4. Writes three CSV files into `data/generated/`:
   - `facility_metadata.csv`
   - `facility_timeseries.csv`
   - `cycle_summary.csv`


In [ ]:
facility_df, timeseries_df, cycle_summary_df = generate_uhs_dataset()

print("Facility metadata rows :", len(facility_df))
print("Timeseries rows        :", len(timeseries_df))
print("Cycle summary rows     :", len(cycle_summary_df))


## 2. Quick peek at the generated files

Below we show the head of each generated CSV as a quick sanity check.


In [ ]:
facility_preview = facility_df.head()
timeseries_preview = timeseries_df.head()
cycle_summary_preview = cycle_summary_df.head()

facility_preview


In [ ]:
timeseries_preview


In [ ]:
cycle_summary_preview
