# ITR Data Template Update

This notebooks help reorganize a Version 1 Sample Data Template into Version 2 format (which splits financial and emissions reporting across two sheets).

The **Sample Data template** provides both a *Read me* sheet and sheet of data dictionary *Definitions*, as well as three input data sheets:
* ITR financial data: The fundamental financial data of companies, listed by security instrument as company id
* ITR emissions and production data: The emissions, and production data of companies, listed by security instrument as company id
* ITR target input data: Short-term Emissions or Intensity reduction targets and Net-Zero attainment target dates, listed by company id
* Portfolio: A list of positions and investment value amounts

The user may choose **Benchmark Data** that forecasts intensity reductions expected from 2020-2050 by region and sector.  By default we use the OECM benchmark, but two TPI benchmarks are also available.  In all three cases we use the same projections for production grwoth forecasts.  We also use the same global carbon budget and TCRE multipliers for all benchmarks.

After scoring the portfolio, the portfolio is copied to the local file *data_dump.xlsx* which can be downloaded for further analysis.

This notebook also outputs an **enhanced portfolio** (with temperature scores), can be aggregated using various weighting methods to gain additional portfolio alignment insights.

Please enjoy learning how the ITR tool works by following the computations performed by this Jupyter Notebook!

## Getting started
Make sure you are running the notebook with the requirements installed available in the example folder.

If you see errors when attempting to load the ITR modules, go to the top-level ITR directory, activate the `itr_env` conda environment (using `conda activate itr_env` and execute the command `pip install -e .`.  Then try again, or hit the <i class="fas fa-forward"></i> button above.

In [1]:
import shutil

import pandas as pd
import numpy as np
from math import log10

from openpyxl.worksheet.dimensions import ColumnDimension, DimensionHolder
from openpyxl.styles import Alignment, Border, PatternFill, Side
from openpyxl.styles.colors import Color
from openpyxl.utils import get_column_letter

from datetime import date, datetime

## Download/load the sample template data

We have prepared sample data from public sources for you to be able to run the tool as it is to familiarise yourself with how it works. To use your own data; please check out to the [Data Template Requirements](https://github.com/os-c/ITR/blob/main/docs/DataTemplateRequirements.rst) section of the technical documentation for more details on data requirements and formatting. 

*The sample data may contain estimates, simplifications, and recategorizations.  It is intended to be generally representative, but not authoritative, and should not be relied upon to make investment decisions.*

In [2]:
# Change these to your filenames

template_data_path_v1 = "data/20220927 ITR Tool Sample Data.xlsx"
template_data_path_v2 = "data/20220927 ITR V_2 Sample Data.xlsx"

We copy the original template file to its new version name; this allows us to preserve formatting information and add new pages easily using `openpyxl`

In [3]:
shutil.copyfile(src=template_data_path_v1, dst=template_data_path_v2)
xlsx_writer = pd.ExcelWriter(
    template_data_path_v2, mode="a", engine="openpyxl"
)  # append mode allows us to add a new sheet
wb_xlsx = xlsx_writer.book
wb_data = pd.read_excel(template_data_path_v2, sheet_name=["ITR input data"])

Split the fundamental data into financial and emissions/production data

In [4]:
index_cols = ["company_name", "company_lei", "company_id"]
itr_sheet = wb_data["ITR input data"]  # .set_index(index_cols)
itr_sheet.report_date = itr_sheet.apply(
    lambda x: x.report_date
    if isinstance(x.report_date, datetime)
    else date(int(x.report_date), 12, 31),
    axis=1,
).copy()
all_cols = itr_sheet.columns
scopes = ["s1", "s2", "s1s2", "s3"]

financial_cols = [
    "country",
    "region",
    "sector",
    "exposure",
    "currency",
    "report_date",
    "market_cap",
    "revenue",
    "ev",
    "evic",
    "assets",
]
df = itr_sheet.set_index(index_cols)[financial_cols]
df.insert(df.columns.get_loc("currency") + 1, "fx_quote", "")
df.insert(df.columns.get_loc("currency") + 2, "fx_rate", 1.0)
df.region = df.region.map(lambda x: "" if x is np.nan else x)
financial_df = df
display(financial_df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,country,region,sector,exposure,currency,fx_quote,fx_rate,report_date,market_cap,revenue,ev,evic,assets
company_name,company_lei,company_id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,US,,Electricity Utilities,equity,USD,,1.0,2020-12-31,9.420000e+09,1.018900e+10,8.652000e+09,9.681000e+09,3.364800e+10
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,4.285300e+09,1.240500e+09,5.829800e+09,5.899100e+09,5.482800e+09
Alliant Energy,5493009ML300G373MZ12,US0188021085,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,1.160000e+10,3.647700e+09,1.850360e+10,1.851990e+10,1.670070e+10
Ameren Corp.,XRZQ5S7HYJFPHJ78L959,US0236081024,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,1.837877e+10,5.910000e+09,2.780477e+10,2.782077e+10,2.893300e+10
"American Electric Power Co., Inc.",1B4S6S7G0TW5EE83BO58,US0255371017,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,4.349186e+10,1.556140e+10,7.341706e+10,7.366386e+10,7.589230e+10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WEC Energy Group,549300IGLYTZUK3PVP70,US92939U1060,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,2.630000e+10,7.523100e+09,3.812080e+10,3.815830e+10,3.495180e+10
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,US,North America,Steel,equity,USD,,1.0,2019-12-31,1.633377e+09,3.759556e+09,2.294114e+09,2.386477e+09,2.510796e+09
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,US,North America,Electricity Utilities,equity,USD,,1.0,2019-12-31,3.062935e+10,1.152900e+10,5.060835e+10,5.085635e+10,5.044800e+10
Balfour Beatty,CT4UIJ3TUKGYYHMENQ17,GB0000961622,GB,,Construction Buildings,equity,USD,,1.0,2021-12-31,2.260000e+09,9.690000e+09,1.810000e+09,2.920000e+09,4.846000e+09


In [5]:
production_cols = [c for c in itr_sheet.columns if c.endswith("production")]
metrics_cols = ["emissions_metric", "production_metric"]

Insert an empty column so we can make it easy to add PDF references by hand

In [6]:
df = itr_sheet.copy()
df["2021_pdf"] = ""

The fundamental wide-to-long transformation that makes data look so much more beautiful

In [7]:
df = pd.wide_to_long(
    df.loc[:, ~df.columns.isin(financial_cols)],
    stubnames=[str(s) for s in list(range(2016, 2023))],
    i=index_cols,
    j="metric",
    sep="_",
    suffix=r".*",
).dropna(how="all", axis=1)
df = df.rename(index={f"ghg_{scope}": scope for scope in scopes}).reset_index("metric")
df.insert(
    df.columns.get_loc("metric"),
    "sub_metric",
    df.apply(
        lambda x: "location"
        if "s2" in x.metric
        else "combined"
        if x.metric == "s3"
        else "",
        axis=1,
    ),
)
df.insert(
    df.columns.get_loc("metric") + 1,
    "unit",
    df.apply(
        lambda x: x.production_metric
        if x.metric == "production"
        else ""
        if x.metric == "pdf"
        else x.emissions_metric,
        axis=1,
    ),
)
df.drop(columns=["production_metric", "emissions_metric"], inplace=True)
df.insert(df.columns.get_loc("unit") + 1, "report_date", date(2021, 12, 31))
df.loc[
    df.metric == "pdf", df.columns[df.columns.get_loc("unit") + 1] : df.columns[-1]
] = ""
df = df.set_index("metric", append=True)
df.columns = df.columns.map(lambda x: int(x) if x[0].isnumeric() else x)
esg_df = df
display(esg_df.iloc[0:10])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sub_metric,unit,report_date,2016,2017,2018,2019,2020,2021
company_name,company_lei,company_id,metric,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s1,,t CO2,2021-12-31,70457000.0,59804000.0,50291000.0,45611000.0,42961000.0,41202392.0
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s2,location,t CO2,2021-12-31,306000.0,220000.0,314000.0,324000.0,254000.0,253302.0
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s1s2,location,t CO2,2021-12-31,70763000.0,60024000.0,50605000.0,45935000.0,43215000.0,41455694.0
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s3,combined,t CO2,2021-12-31,5864000.0,13871800.0,10071100.0,9973200.0,7269200.0,7351038.0
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,production,,GWh,2021-12-31,104312.0,94148.97,83985.94,75904.355,75271.522,72506.148
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,pdf,,,,,,,,,
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,s1,,Mt CO2,2021-12-31,8.028792,6.56607,6.622019,4.223366,3.750732,4.223366
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,s2,location,Mt CO2,2021-12-31,0.0,0.0,0.0,0.0,0.0,0.0
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,s1s2,location,Mt CO2,2021-12-31,8.028792,6.56607,6.622019,4.223366,3.750732,4.223366
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,s3,combined,Mt CO2,2021-12-31,,,,,,


Define a helper function to estimate how wide to make each column

In [8]:
def get_cell_width(x, na_action="error"):
    if isinstance(x, float):
        if x > 0:
            return log10(x) + 1
        else:
            return 12
    if pd.isna(x):
        return 2
    return len(str(x))

Put pandas values into new worksheet(s).

First up, the fundamental financial data

In [9]:
old_sheet = "ITR input data"
new_sheet = "ITR V2 input data"

df = financial_df
df.to_excel(xlsx_writer, sheet_name=new_sheet)
# resetting the index after writing the dataframe lets us adjust width and keep column consistency with column-based Excel things
df = df.reset_index()
financial_ws = xlsx_writer.sheets[new_sheet]
dim_holder = DimensionHolder(worksheet=financial_ws)
financial_fill = wb_xlsx[old_sheet].cell(row=1, column=1).fill.copy()

# Then make the worksheet pretty
for i, col in enumerate(df.columns):
    if col == "report_date":
        # We fudge the width with number_format that `str(datetime)` doesn't understand
        dim_holder[get_column_letter(i + 1)] = ColumnDimension(
            financial_ws, min=i + 1, max=i + 1, width=len(col) + 2
        )
    else:
        dim_holder[get_column_letter(i + 1)] = ColumnDimension(
            financial_ws,
            min=i + 1,
            max=i + 1,
            width=max(
                df.iloc[:, i].map(lambda x: len(str(x))).max()
                + 2 * (col == "company_lei"),
                len(col),
            )
            + 2,
        )
    if i <= 2:
        # Format index columns
        for j in range(1, financial_ws.max_row + 1):
            financial_ws.cell(column=i + 1, row=j).alignment = Alignment(
                horizontal="left", vertical="center"
            )
    else:
        if col == "report_date":
            for j in range(1, financial_ws.max_row + 1):
                financial_ws.cell(row=j, column=i + 1).number_format = "yyyy-mm-dd"
        financial_ws.cell(column=i + 1, row=1).fill = financial_fill

financial_ws.column_dimensions = dim_holder

# Lighten the Region column, which is optional
region_col_letter = get_column_letter(df.columns.get_loc("region") + 1)
for cell in financial_ws[
    f"{region_col_letter}1:{region_col_letter}{financial_ws.max_row}"
]:
    cell[0].font = cell[0].font.copy(color=Color("FF888888"))

  financial_fill = wb_xlsx[old_sheet].cell(row=1,column=1).fill.copy()
  cell[0].font = cell[0].font.copy(color=Color('FF888888'))


Next up, the ESG data

In [10]:
new_sheet = "ITR V2 esg data"
df = esg_df
df.to_excel(xlsx_writer, sheet_name=new_sheet)
# resetting the index after writing the dataframe lets us adjust width and keep column consistency with column-based Excel things
df = df.reset_index()
esg_ws = xlsx_writer.sheets[new_sheet]
dim_holder = DimensionHolder(worksheet=esg_ws)
thin_border = Border(
    left=Side(style="thin", color="FFC6C6C6"),
    right=Side(style="thin", color="FFC6C6C6"),
    top=Side(style="thin", color="FFC6C6C6"),
    bottom=Side(style="thin", color="FFC6C6C6"),
)

# We have only one cell to color, so it's not in the loop
esg_ws.cell(column=df.columns.get_loc("unit") + 1, row=1).fill = (
    wb_xlsx[old_sheet]
    .cell(column=itr_sheet.columns.get_loc("emissions_metric") + 1, row=1)
    .fill.copy()
)

# Make worksheet pretty
for i, col in enumerate(df.columns):
    if col == "report_date":
        # We fudge the width with number_format that `str(datetime)` doesn't understand
        dim_holder[get_column_letter(i + 1)] = ColumnDimension(
            financial_ws, min=i + 1, max=i + 1, width=len(col) + 2
        )
    else:
        dim_holder[get_column_letter(i + 1)] = ColumnDimension(
            esg_ws,
            min=i + 1,
            max=i + 1,
            width=max(
                df.iloc[:, i].map(get_cell_width).max() + 2 * (col == "company_lei"),
                len(str(col)),
            )
            + 2,
        )
    if i <= 2:
        # Format index columns
        for j in range(1, esg_ws.max_row + 1):
            esg_ws.cell(column=i + 1, row=j).alignment = Alignment(
                horizontal="left", vertical="center"
            )
    elif col == "report_date":
        for j in range(1, esg_ws.max_row + 1):
            esg_ws.cell(row=j, column=i + 1).number_format = "yyyy-mm-dd"
    elif col in range(2016, 2323):
        column_color = "EEEEEE" if (col % 2) == 0 else "FFFFFF"
        for j in range(1, esg_ws.max_row + 1):
            esg_ws.cell(column=i + 1, row=j).fill = PatternFill(
                "solid", start_color=column_color
            )
            esg_ws.cell(column=i + 1, row=j).border = thin_border

esg_ws.column_dimensions = dim_holder

  esg_ws.cell(column=df.columns.get_loc('unit')+1, row=1).fill = wb_xlsx[old_sheet].cell(


In [11]:
xlsx_writer.save()

Show how to read the data back, and what it looks like

In [12]:
x = pd.read_excel(
    template_data_path_v2, sheet_name="ITR V2 esg data", index_col=[0, 1, 2, 3]
)
display(x)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sub_metric,unit,report_date,2016,2017,2018,2019,2020,2021
company_name,company_lei,company_id,metric,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s1,,t CO2,2021-12-31,7.045700e+07,5.980400e+07,5.029100e+07,4.561100e+07,4.296100e+07,4.120239e+07
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s2,location,t CO2,2021-12-31,3.060000e+05,2.200000e+05,3.140000e+05,3.240000e+05,2.540000e+05,2.533020e+05
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s1s2,location,t CO2,2021-12-31,7.076300e+07,6.002400e+07,5.060500e+07,4.593500e+07,4.321500e+07,4.145569e+07
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,s3,combined,t CO2,2021-12-31,5.864000e+06,1.387180e+07,1.007110e+07,9.973200e+06,7.269200e+06,7.351038e+06
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,production,,GWh,2021-12-31,1.043120e+05,9.414897e+04,8.398594e+04,7.590435e+04,7.527152e+04,7.250615e+04
...,...,...,...,...,...,...,...,...,...,...,...,...
CBRE,52990016II9MJ2OSWA10,US12504L1098,s2,location,t CO2,2021-12-31,2.967800e+04,2.501000e+04,2.443900e+04,2.802000e+04,2.264400e+04,1.984700e+04
CBRE,52990016II9MJ2OSWA10,US12504L1098,s1s2,location,t CO2,2021-12-31,9.309200e+04,7.106700e+04,7.050800e+04,8.679000e+04,8.302300e+04,6.609800e+04
CBRE,52990016II9MJ2OSWA10,US12504L1098,s3,combined,t CO2,2021-12-31,1.695400e+04,1.862600e+04,1.998400e+04,5.830793e+07,5.468473e+07,8.916877e+07
CBRE,52990016II9MJ2OSWA10,US12504L1098,production,,ft**2,2021-12-31,5.300000e+09,5.500000e+09,6.000000e+09,6.800000e+09,7.000000e+09,7.100000e+09
