# 01 – Generate SUHS‑MRV UHS Dataset

This notebook regenerates the **SUHS‑MRV v2.0** Underground Hydrogen Storage dataset using `src/generator.py`.

It will:

1. Add the repository root to `sys.path` so `src` can be imported from the `notebooks/` folder.
2. Call the dataset generator.
3. Show basic information for each generated CSV.

Run this notebook from the `notebooks/` directory of the repository.

In [None]:
import sys
from pathlib import Path

import pandas as pd

In [None]:
NOTEBOOK_DIR = Path.cwd()
REPO_ROOT = NOTEBOOK_DIR.parent
DATA_DIR = REPO_ROOT / 'data' / 'generated'

if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

print('Notebook dir:', NOTEBOOK_DIR)
print('Repo root   :', REPO_ROOT)
print('Data dir    :', DATA_DIR)

In [None]:
from src.generator import generate_uhs_dataset

facility_df, timeseries_df, cycle_summary_df = generate_uhs_dataset()

print('Generation complete.')
print('facility_metadata rows :', len(facility_df))
print('facility_timeseries rows:', len(timeseries_df))
print('cycle_summary rows      :', len(cycle_summary_df))

## Inspect generated CSV files on disk

In [None]:
DATA_DIR.mkdir(parents=True, exist_ok=True)

for path in sorted(DATA_DIR.glob('*.csv')):
    size_kb = path.stat().st_size / 1024.0
    print(f"{path.name:30s}  {size_kb:8.1f} KB")

## Quick preview of each CSV

In [None]:
facility_path = DATA_DIR / 'facility_metadata.csv'
facility_df = pd.read_csv(facility_path)
facility_df.head()

In [None]:
ts_path = DATA_DIR / 'facility_timeseries.csv'
timeseries_df = pd.read_csv(ts_path, parse_dates=['timestamp'])
timeseries_df.head()

In [None]:
cycle_path = DATA_DIR / 'cycle_summary.csv'
cycle_summary_df = pd.read_csv(cycle_path, parse_dates=['cycle_start', 'cycle_end'])
cycle_summary_df.head()