# PyPSA Statistics Module Training

A guided ~40-min session exploring `n.statistics` — PyPSA's high-level API for
querying costs, capacities, energy flows, and market metrics from optimized networks.

We use `pypsa.examples.carbon_management()`, a sector-coupled European energy system
from a [Nature Energy paper](https://www.nature.com/articles/s41560-025-01752-6) on
H₂/CO₂ network strategies (2164 buses, 89 carriers, 20 days at 3h resolution).

## 0. Setup & Imports

In [None]:
import pypsa
import pandas as pd
import matplotlib.pyplot as plt

pd.options.display.float_format = "{:,.1f}".format

## 1. Motivation & Network Overview

Manual DataFrame wrangling for multi-carrier, multi-component networks is error-prone
and verbose. `n.statistics` provides a consistent, high-level API that handles
component iteration, port mapping, and carrier grouping automatically.
(`n.stats` is available as a shorthand alias for `n.statistics`.)

In [None]:
n = pypsa.examples.carbon_management()

In [None]:
print(f"Buses: {len(n.buses):,}")
print(f"Carriers: {len(n.carriers)}")
print(f"Snapshots: {len(n.snapshots)} ({n.snapshots[0]} → {n.snapshots[-1]})")
print()
for name, comp in [("generators", n.generators), ("links", n.links), ("stores", n.stores),
                    ("storage_units", n.storage_units), ("loads", n.loads), ("lines", n.lines)]:
    print(f"{name:20s} {len(comp):>6,}")

In [None]:
n.buses.carrier.value_counts().head(15)

The **summary table** `n.statistics()` gives a quick overview of all metrics at once.

In [None]:
n.statistics()

## 2. Cost Analysis

Three cost methods form a hierarchy: **capex** (capital = installed_capex + expanded_capex),
**opex** (operational), and **system_cost** (capex + opex combined).
Additionally, `installed_capex` and `expanded_capex` break down capital costs
by existing vs newly built capacity.

In [None]:
capex = n.statistics.capex()
capex.head(10)

In [None]:
opex = n.statistics.opex()
opex.head(10)

In [None]:
costs = pd.concat(
    [capex.rename("CAPEX"), opex.rename("OPEX")], axis=1
).dropna(how="all").div(1e9)

costs.sort_values("CAPEX", ascending=False).head(15).plot.barh(
    title="Top 15 Carriers by Cost (bn €)", figsize=(8, 5)
)
plt.xlabel("bn €")
plt.tight_layout()

In [None]:
system_cost = n.statistics.system_cost()
print(f"Total system cost: {system_cost.sum() / 1e9:.1f} bn €")

The `groupby_method` parameter controls how values are aggregated within each group.
Compare `"sum"` (default, total) vs `"mean"` (average per asset).

In [None]:
pd.concat([
    n.statistics.capex(groupby_method="sum").rename("sum"),
    n.statistics.capex(groupby_method="mean").rename("mean"),
], axis=1).head(10)

## 3. Capacity Analysis

Four capacity methods: **installed** (existing before optimization),
**optimal** (post-optimization), **expanded** (newly built = optimal − installed),
and **capacity_factor** (utilization rate).

In [None]:
cap = pd.concat([
    n.statistics.installed_capacity().rename("Installed"),
    n.statistics.optimal_capacity().rename("Optimal"),
    n.statistics.expanded_capacity().rename("Expanded"),
], axis=1).dropna(how="all")

cap.head(15)

In [None]:
cf = n.statistics.capacity_factor()
cf.sort_values(ascending=False).head(15)

Filter by carrier to focus on a single technology.

In [None]:
n.statistics.capacity_factor(carrier="solar")

## 4. Energy Flows

- **supply** / **withdrawal** — one-directional energy production / consumption
- **energy_balance** — net energy (positive = supply, negative = withdrawal)
- **transmission** — energy flowing through branch components
- **curtailment** — wasted generation potential

In [None]:
n.statistics.supply().sort_values(ascending=False).head(10)

In [None]:
n.statistics.withdrawal().sort_values().head(10)

In [None]:
eb = n.statistics.energy_balance()
eb.head(15)

The default groupby for `energy_balance` is `["carrier", "bus_carrier"]`.
We can use `.xs()` to extract a specific bus carrier.

In [None]:
ac_balance = eb.xs("AC", level="bus_carrier")
ac_balance.droplevel(0).sort_values().plot.barh(
    figsize=(8, 6)
)
plt.xlabel("MWh")
plt.tight_layout()

In [None]:
n.statistics.transmission()

In [None]:
curt = n.statistics.curtailment()
curt[curt > 0].sort_values(ascending=False)

## 5. Market Metrics

- **prices** — marginal prices per bus
- **revenue** — income earned by each component
- **market_value** — revenue per unit of energy (€/MWh)

In [None]:
prices = n.statistics.prices()
print(f"Mean electricity price: {prices.mean():.2f} €/MWh")
prices.hist(bins=40, figsize=(8, 3), edgecolor="white")
plt.xlabel("€/MWh")
plt.title("Distribution of Marginal Prices")
plt.tight_layout()

In [None]:
n.statistics.revenue().sort_values(ascending=False).head(10)

In [None]:
n.statistics.market_value().sort_values(ascending=False).head(10)

## 6. Groupby & Filtering

Every statistics method accepts the same filtering and grouping parameters:

| Parameter | Description |
|---|---|
| `groupby` | String, list, or callable — how to group results (default: `"carrier"`) |
| `groupby_method` | Aggregation function (`"sum"` (default), `"mean"`, …) |
| `groupby_time` | `"sum"`, `"mean"`, or `False` for time series — default varies by method |
| `components` | Filter to specific component types |
| `carrier` | Filter by carrier name (internal name) |
| `bus_carrier` | Filter by the carrier of the bus |
| `nice_names` | Use human-readable carrier names (default: `True`) |

Note: `prices()` has a simplified interface — `groupby` and `groupby_time` are booleans,
and it does not accept `carrier` or `components`.

### Default groupby vs custom groupby

In [None]:
n.statistics.capex(groupby="bus_carrier").head(5)

In [None]:
n.statistics.capex(groupby=["bus_carrier", "carrier"]).head(10)

### Component filtering

In [None]:
n.statistics.supply(components=["Generator", "Link"]).head(10)

### Time series mode

Set `groupby_time=False` to get MW time series instead of MWh aggregates.

In [None]:
ts = n.statistics.energy_balance(groupby_time=False)
print(f"Shape: {ts.shape} — rows are (component, carrier), columns are timestamps")
ts.iloc[:5, :5]

In [None]:
ts_mean = n.statistics.energy_balance(groupby_time="mean")
ts_sum = n.statistics.energy_balance(groupby_time="sum")

pd.concat([
    ts_mean.rename("mean (MW)"),
    ts_sum.rename("sum (MWh)"),
], axis=1).head(10)

### Built-in groupers

Available grouper names: `bus`, `bus_carrier`, `carrier`, `country`, `location`, `name`, `unit`.

In [None]:
from pypsa.statistics import groupers

print("Available groupers:", list(groupers.list_groupers().keys()))

In [None]:
n.statistics.capex(groupby="country").head(10)

### Custom grouper

Register a custom grouper via `groupers.add_grouper(name, func)`. The function
signature is `(n, c, port) -> pd.Series` where `c` is the component name.

In [None]:
def tech_type(n, c, port=""):
    carriers = n.c[c].static["carrier"]
    nice_names = carriers.map(n.carriers.nice_name)
    conditions = {
        "Renewable": ["wind", "solar", "ror", "hydro"],
        "Conventional": ["gas", "oil", "nuclear"],
        "Storage": ["battery", "storage", "reservoir"],
    }
    def classify(name):
        lower = name.lower()
        for label, keywords in conditions.items():
            if any(kw in lower for kw in keywords):
                return label
        return "Other"
    return nice_names.map(classify).rename("tech_type")

groupers.add_grouper("tech_type", tech_type)

In [None]:
n.statistics.capex(groupby="tech_type")

In [None]:
n.statistics.supply(groupby=["tech_type", "carrier"]).head(15)

## 7. Plotting from Statistics (v1.0 feature)

Every statistics method has `.plot` (matplotlib) and `.iplot` (plotly) accessors.
Plotting methods inherit all statistics parameters — groupby, carrier filtering, etc.

Available plot types: `bar`, `line`, `area`, `box`, `scatter`, `histogram`, `violin`, `chart`, `map`.

In [None]:
n.statistics.optimal_capacity.plot.bar()

In [None]:
n.statistics.supply.plot.bar(carrier=["onwind", "solar", "offwind"])

In [None]:
n.statistics.capex.plot.bar()

In [None]:
n.statistics.energy_balance.iplot.area(
    x="snapshot",
    title="Energy Balance Time Series (interactive)",
)

## 8. Recap

| Category | Methods |
|---|---|
| **Costs** | `capex()`, `installed_capex()`, `expanded_capex()`, `opex()`, `system_cost()` |
| **Capacity** | `installed_capacity()`, `optimal_capacity()`, `expanded_capacity()`, `capacity_factor()` |
| **Energy** | `supply()`, `withdrawal()`, `energy_balance()`, `transmission()`, `curtailment()` |
| **Market** | `prices()`, `revenue()`, `market_value()` |
| **Overview** | `n.statistics()` (summary table) |

Key parameters: `groupby`, `groupby_method`, `groupby_time`, `components`, `carrier`, `bus_carrier`, `nice_names`.

**Docs**: [pypsa.org/latest/user-guide/statistics](https://docs.pypsa.org/latest/user-guide/statistics/)