# TPI Benchmark Data Pipeline

The Benchmark data pipelines organize and assemble benchmark data needed for the ITR tool.  This pipeline supports several TPI Benchmark scenarios (published 20 October 2022).


### Environment variables and dot-env

The following cell looks for a "dot-env" file in some standard locations,
and loads its contents into `os.environ`.

In [1]:
import os
import pathlib
import numpy as np
import pandas as pd

# import python_pachyderm

In [2]:
import json
from math import log10

In [3]:
# See data-platform-demo/pint-demo.ipynb for quantify/dequantify functions

from pint_pandas import PintArray
from common_units import ureg

Q_ = ureg.Quantity
PA_ = PintArray

Initializing common units...


Define Environment and Execution Variables

### S3 and boto3

### Connecting to Trino with sqlalchemy

In the context of the Data Vault, this pipeline operates with full visibiilty into all the data it prepares for the ITR tool.  When the data is output, it is labeled so that the Data Vault can enforce its data management access rules.

In [4]:
# TPI Benchmark arrives in DataFrame-ready format.  Read the CSV file and then we'll tidy it up

benchmark_TPI_dir = os.path.abspath("../data/external/TPI 20221022")

csv_df = pd.read_csv(pathlib.Path(benchmark_TPI_dir, "Sector_Benchmarks_20102022.csv"))
csv_df["Release date"] = pd.to_datetime(csv_df["Release date"], dayfirst=True)

In [5]:
bm_dict = {}
for scenario_name in csv_df["Scenario name"].unique():
    # Until we know the temperature targets of the pledges, don't deal with those as benchmarks per se
    if "Pledges" in scenario_name:
        continue
    if scenario_name == "1.5 Degrees":
        benchmark_temperature = 1.5
        benchmark_global_budget = 396  # 66% probability; 500 Gt 50% probability
    elif scenario_name == "Below 2 Degrees":
        benchmark_temperature = 1.65  # 66% probability
        benchmark_global_budget = 646
    else:
        benchmark_temperature = 2.0
        benchmark_global_budget = (
            1229  # starting from 1.5 @ 66% prob, plus 0.5C at 0.0006 tcre
        )
    df = csv_df[csv_df["Scenario name"].eq(scenario_name)]
    idx = (
        df.groupby(["Sector name", "Region"])["Release date"].transform("max")
        == df["Release date"]
    )
    df = df.loc[idx].copy()
    df["benchmark_temperature"] = benchmark_temperature
    df["benchmark_global_budget"] = benchmark_global_budget
    df.Unit = (
        df.Unit.str.replace("Carbon intensity ", "")
        .str.replace("Emissions intensity ", "")
        .str.replace("metric tonnes of", "t")
        .str.replace("CO2e", "CO2")
        .str.replace("gCO2", "g CO2")
        .str.replace("tonnes of", "t")
        .str.replace("t-km", "tkm")
        .str.replace("RTK", "tkm")
        .str.replace("/ t aluminium", "/(t Aluminum)")
        .str.replace(" per tonne of cementitious product", "/(t Cement)")
        .str.replace("tonne copper equivalent", "(t Copper)")
        .str.replace(" per tonne of steel", "/(t Steel)")
        .str.replace(" per MWh electricity generation", "/MWh")
        .str.replace(" per tonne of pulp, paper and paperboard", "/(t Paper)")
        .str.replace("tonne ", "t ")
        .str.replace("tCO2", "t CO2")
        .map(lambda x: x[1:-1])
    )
    df.Region = df.Region.str.replace("North-America", "North America")
    bm_dict[scenario_name] = df
print(bm_dict.keys())
display(bm_dict["2 Degrees"])

dict_keys(['1.5 Degrees', 'Below 2 Degrees', '2 Degrees (Shift-Improve)', '2 Degrees (High Efficiency)', '2 Degrees'])


Unnamed: 0,Benchmark ID,Sector name,Scenario name,Region,Release date,Unit,2013,2014,2015,2016,...,2043,2044,2045,2046,2047,2048,2049,2050,benchmark_temperature,benchmark_global_budget
14,Aluminium_01/02/2021,Aluminium,2 Degrees,Global,2021-02-01,t CO2 /(t Aluminum),,6.342,6.161,5.98,...,1.65,1.584,1.519,1.478,1.437,1.397,1.356,1.316,2.0,1229
37,Cement_01/02/2021,Cement,2 Degrees,Global,2021-02-01,t CO2/(t Cement),,0.488,0.488,0.489,...,0.359,0.351,0.343,0.335,0.327,0.319,0.312,0.304,2.0,1229
54,Diversified Mining_01/02/2021,Diversified Mining,2 Degrees,Global,2021-02-01,t CO2 / (t Copper),,61.985,61.497,60.939,...,34.072,33.11,32.135,31.5,30.855,30.2,29.534,28.858,2.0,1229
76,Electricity Utilities_01/10/2020,Electricity Utilities,2 Degrees,Global,2020-10-01,t CO2/MWh,0.586,0.572,0.553,0.534,...,0.072,0.064,0.056,0.052,0.048,0.044,0.04,0.036,2.0,1229
88,Oil & Gas_01/10/2020,Oil & Gas,2 Degrees,Global,2020-10-01,g CO2 / MJ,,65.57,64.69,63.81,...,29.74,28.51,27.27,26.16,25.05,23.93,22.82,21.7,2.0,1229
95,Paper_01/02/2021,Paper,2 Degrees,Global,2021-02-01,t CO2/(t Paper),,0.706,0.685,0.664,...,0.168,0.148,0.127,0.127,0.128,0.128,0.128,0.128,2.0,1229
110,Shipping_01/12/2020,Shipping,2 Degrees,Global,2020-12-01,g CO2 / tkm,,,10.57,10.38,...,4.11,3.83,3.55,3.27,2.99,2.71,2.42,2.14,2.0,1229
118,Steel_01/02/2021,Steel,2 Degrees,Global,2021-02-01,t CO2/(t Steel),,1.669,1.639,1.61,...,0.731,0.704,0.677,0.667,0.655,0.644,0.632,0.621,2.0,1229


In [6]:
df = bm_dict["1.5 Degrees"]

df[["Sector name", "Region", "Unit", "2019", "2030", "2050"]]

Unnamed: 0,Sector name,Region,Unit,2019,2030,2050
0,Airlines,Global,g CO2 / tkm,1020.0,616.0,108.0
33,Cement,Global,t CO2/(t Cement),0.545,0.419,0.031
47,Diversified Mining,Global,t CO2 / (t Copper),59.03,42.96,1.21
59,Electricity Utilities,non-OECD,t CO2/MWh,0.564,0.179,0.0
60,Electricity Utilities,North America,t CO2/MWh,0.328,0.068,0.0
65,Electricity Utilities,OECD,t CO2/MWh,0.329,0.064,0.0
70,Electricity Utilities,Europe,t CO2/MWh,0.259,0.046,0.0
71,Electricity Utilities,Global,t CO2/MWh,0.468,0.138,0.0
86,Oil & Gas,Global,g CO2 / MJ,62.88,40.95,5.85
106,Shipping,Global,g CO2 / tkm,7.82,4.31,0.4


### Construct JSON benchmark structures

0.  TPI proivdes annual benchmark values so no need to interpolate
1.  TPI defines region-speciifc benchmarks for Electricity Utilities, all others Global
2.  Different sectors have different scopes for benchmarks (S1, S1S2, S1S2S3)
3.  Only emit the latest version of the benchmark
4.  There are several potential global carbon budgets:
    a.  50/50 chance of 1.5C
    b.  66% chance of 1.5C
    c.  Below 2 degrees == 1.65C
    d.  2 degrees (Shift-improve, High-efficiency, Default)

In [7]:
# https://til.simonwillison.net/python/json-floating-point
# Modified to blend the concept of "precision after the decimal point" with "significant figures" (SF).
# For numbers in (-1,1), gives PRECISION=3 sig figs.  For numbers outside that range, but within (-10,10), an addition SF.
# Will provide up to PRECISION-1 additional SFs (default 2) for larger absolute magnitudes.


# from math import log10
def round_floats(o, precision=3):
    if isinstance(o, float):
        if o == 0 or np.isnan(o):
            return 0
        lo = int(log10(abs(o))) - (abs(o) > 10)
        if precision + lo < 0:
            return 0
        if precision * 2 < lo:
            return round(o)
        return round(o, precision - lo)
    if isinstance(o, dict):
        return {k: round_floats(v, precision) for k, v in o.items()}
    if isinstance(o, (list, tuple)):
        return [round_floats(x, precision) for x in o]
    if isinstance(o, pd.Timestamp):
        dt, hms = str(o).split(" ")
        if hms == "00:00:00":
            return dt
        return str(o)
    return o

In [8]:
ei_sectors_scope = {
    "Electricity Utilities": "S1",
    "Oil & Gas": "S1S2S3",
    "Autos": "S3",
    "Airlines": "S1",
    "Shipping": "S1",
    "Cement": "S1",
    "Diversified Mining": "S1S2S3",
    "Steel": "S1S2",
    "Aluminum": "S1S2",
    "Aluminium": "S1S2",
    "Paper": "S1S2",
}

In [9]:
ei_bms = {}

for scenario_name, df in bm_dict.items():
    try:
        ei_bms[scenario_name] = {
            "benchmark_temperature": f"{df.iloc[0].benchmark_temperature} delta_degC",
            "benchmark_global_budget": f"{df.iloc[0].benchmark_global_budget} Gt CO2",
            "is_AFOLU_included": False,
        }
    except IndexError:
        print(df)
        print(scenario_name)

    for scope in ["S1", "S1S2", "S1S2S3", "S3"]:
        bm_scope = {
            "benchmarks": [
                {
                    "sector": row["Sector name"],
                    "region": row["Region"],
                    "benchmark_metric": row["Unit"],
                    "scenario name": f"TPI {scenario_name}",
                    "release date": str(row["Release date"]).split(" ")[0],
                    "projections_nounits": [
                        {"year": year, "value": row[str(year)]}
                        for year in range(2019, 2051)
                    ],
                }
                for index, row in df.iterrows()
                if ei_sectors_scope[row["Sector name"]] == scope
            ]
        }
        if len(bm_scope["benchmarks"]):
            ei_bms[scenario_name][scope] = bm_scope

### Emit Sector Benchmark Data

In [10]:
output_datadir = os.path.abspath("../data/processed/TPI 20220504")
pathlib.Path(output_datadir).mkdir(parents=True, exist_ok=True)

In [11]:
for scenario_name, bm in ei_bms.items():
    path_name = scenario_name.translate(str.maketrans(" .-", "___", "()")).lower()
    with open(
        pathlib.Path(output_datadir, f"benchmark_EI_TPI_{path_name}.json"), "w"
    ) as f:
        json.dump(round_floats(bm), sort_keys=False, indent=2, fp=f)
        print("", file=f)