# ITR Data Pipeline

The ITR data pipeline organizes and assembles data needed for the ITR tool.  The data may come from many sources, but the output of this pipeline is a complete, consistent dataset that can be fully interrogated by the ITR tool.  If users wish to add additional data or analyze additional portfolio companies, they must create a new dataset using this pipeline.

These are the data needed to create the ITR dataset:
* Global Parameters (just for reference--we do nothing with them here)
* Region and Country Name Data (borrowed from ESSD dataset)
* Industry Data (Sector Projections aka Benchmarks)
* Portfolio Data (Must cover all the stocks a user may query)
* Company Data (Must cover all companies in all possible portfolio universes)
* Automization (Must cover all years and scenarios a user may query)

Note that the portfolio_universe table goes into an accessible sandbox because it's composed of DERA and other data and useful to all.  Ditto isic_to_sector.  All ITR-specific data goes into {demo_schema} with an {itr_prefix} prefix.

The ITR tool can create secondary datasets:
* Cumulative emissions targets trajectories
* Cumulative emissions budgets
* Target and trajectory overshoot/undershoot ratios
* Target and trajectory temperature scores

These secondary datasets are not the concern of this pipeline.

In [1]:
import os
from pathlib import Path

import numpy as np
import pandas as pd

import boto3
from sqlalchemy import text
import osc_ingest_trino as osc

# import python_pachyderm

# See data-platform-demo/pint-demo.ipynb for quantify/dequantify functions
import warnings  # needed until quantile behaves better with Pint quantities in arrays
from pint import Quantity
from pint_pandas import PintArray
from common_units import ureg

# openscm_units doesn't make it easy to set preprocessors.  This is one way to do it.
ureg.preprocessors = [
    lambda s1: s1.replace("passenger km", "passenger_km"),
    lambda s2: s2.replace("BoE", "boe"),
]

Q_ = ureg.Quantity
PA_ = PintArray

# Load environment variables from credentials.env
osc.load_credentials_dotenv()

Initializing common units...


### Connecting to Trino with sqlalchemy

In the context of the Data Vault, this pipeline operates with full visibiilty into all the data it prepares for the ITR tool.  When the data is output, it is labeled so that the Data Vault can enforce its data management access rules.

In [2]:
ingest_catalog = "osc_datacommons_dev"
ingest_schema = "mdt_sandbox"
dera_schema = "dera"
dera_prefix = "dera_"
gleif_schema = "sandbox"
rmi_schema = "rmi"
rmi_prefix = ""
iso3166_schema = "mdt_sandbox"
essd_schema = "essd"
essd_prefix = ""
demo_schema = "demo_dv"

itr_prefix = "itr_"

engine = osc.attach_trino_engine(verbose=True, catalog=ingest_catalog)
cxn = engine.connect()

using connect string: trino://MichaelTiemannOSC@trino-secure-odh-trino.apps.odh-cl2.apps.os-climate.org:443/osc_datacommons_dev


### S3 and boto3

In [3]:
s3_source = boto3.resource(
    service_name="s3",
    endpoint_url=os.environ["S3_LANDING_ENDPOINT"],
    aws_access_key_id=os.environ["S3_LANDING_ACCESS_KEY"],
    aws_secret_access_key=os.environ["S3_LANDING_SECRET_KEY"],
)
source_bucket = s3_source.Bucket(os.environ["S3_LANDING_BUCKET"])

## Global Parameters

These parameters are set/selected by the ITR tool.  They are included here for reference only (the following is not live code).

### Create the ISIC-to-Sector table manually until we have a proper sector mapping table

In [4]:
i2s_df = pd.DataFrame(
    {
        "isic": [
            2410,
            3241,
            3270,
            3272,
            3310,
            3312,
            3317,
            4010,
            4911,
            4931,
            4932,
            4991,
        ],
        "sector": [
            "Steel",
            "Cement",
            "Cement",
            "Cement",
            "Steel",
            "Steel",
            "Steel",
            "Electricity Utilities",
            "Electricity Utilities",
            "Electricity Utilities",
            "Gas Utilities",
            "Electricity Utilities",
        ],
    }
).convert_dtypes()

ingest_table = "isic_to_sector"
drop_table = osc._do_sql(
    f"drop table if exists {ingest_schema}.{ingest_table}", engine, verbose=True
)

columnschema = osc.create_table_schema_pairs(i2s_df)

tabledef = f"""
create table if not exists {ingest_catalog}.{ingest_schema}.{ingest_table}(
{columnschema}
) with (
    format = 'ORC',
    partitioning = array['bucket(isic,20)']
)
"""

qres = osc._do_sql(tabledef, engine, verbose=True)
i2s_df.to_sql(
    ingest_table,
    con=engine,
    schema=ingest_schema,
    if_exists="append",
    index=False,
    method=osc.TrinoBatchInsert(batch_size=2000, verbose=True),
)

drop table if exists mdt_sandbox.isic_to_sector

create table if not exists osc_datacommons_dev.mdt_sandbox.isic_to_sector(
    isic bigint,
    sector varchar
) with (
    format = 'ORC',
    partitioning = array['bucket(isic,20)']
)

constructed fully qualified table name as: "mdt_sandbox.isic_to_sector"
inserting 12 records
  (2410, 'Steel')
  (3241, 'Cement')
  (3270, 'Cement')
  ...
  (4991, 'Electricity Utilities')
batch insert result: [(12,)]


## Portfolio Data

The user will ultimately supply portfolio selection and position information to the ITR tool as part of the weighting calculations.  This part of the pipeline just collects the LEI and ISIN information for companies we should expect to analyze (i.e., companies for which we have fundamental financial information, production, intensity, and target information, in sectors for which we have benchmark projections).

Because this pipeline does the full pre-computation of data for the tool, there is no sense carrying forward information that is not fully closed.  I.e., there's no reason to carry forward an LEI:ISIN relationship if there is no financial, production, or target information related to that LEI and/or ISIN.  The user does not add such data later; the data is collected and fully processed by this pipeline now.

### Get LEI/ISIN data

RMI handes us data already matched with LEIs and ISINs.  Other lists of company names may require us to stitch that together manually.

In [5]:
# TODO: sort why some notorious utilities are missing LEIs in the following query--bad source data?
# Note: the place to fix the bad data would be osc-ingest-rmi_utility_transition_hub, not here.
rmi_lei_isin = pd.read_sql(
    f"select DISTINCT parent_name, parent_lei, isin from {rmi_schema}.utility_information_2023 where parent_name IS NOT NULL",
    engine,
)
# Fabricate LEIs for entities that have none.  There are 200 or so names with proper LEIs and over 12,000 without.
# Many of those without are subsidiaries of those that do, but we don't have a proper theory as to how to downscale
# finacial statistics down to those lines of business.  See comments below.
missing_leis = list(
    rmi_lei_isin.loc[rmi_lei_isin.parent_lei.isnull()].parent_name.unique()
)
print(f"A list of 20 (of {len(missing_leis)} entities without valid parent_lei")
print(sorted(missing_leis[0:20]))
rmi_lei_isin.loc[rmi_lei_isin.parent_lei.isnull(), "parent_lei"] = rmi_lei_isin.apply(
    lambda x: f"RMI{x.name:017}", axis=1
)
rmi_lei_isin.loc[rmi_lei_isin["isin"].isnull(), "isin"] = rmi_lei_isin.apply(
    lambda x: f"ZZ{x.name:011}", axis=1
)
# Install LEIs whose hierarchy levels don't match what we matched for SEC DERA data, if any

# It is tempting to consolidate subsidiaries to parents, like this:
# rmi_lei_isin.loc[rmi_lei_isin.parent_name.str.startswith("AES "), "parent_lei"] = "2NUNNB7D43COUIRE5295"
# rmi_lei_isin.loc[rmi_lei_isin.parent_name.str.startswith("AEP "), "parent_lei"] = "1B4S6S7G0TW5EE83BO58"
# But the problem is the subsidiaries don't have data that matches their boundaries.  A single solar farm
# ultimately owned by Southern Company has nothing to do with Southern Company's emissions, targets, or financial data.

rmi_lei_dict = dict(zip(rmi_lei_isin.parent_lei, rmi_lei_isin["isin"], strict=False))

A list of 20 (of 12369 entities without valid parent_lei
['Amerada Hess Corp', 'American PowerNet Mangt, LP', 'Devonshire Energy, LLC', 'EDF Industrial Power Services (NY), LLC', 'En-Touch Energy', 'Energetix', 'Freedom Energy', 'General Power & Light', 'Hino Gas Sales, Inc.', 'Just Energy Group Inc.', 'KeySpan Energy Services Inc', 'New Mexico Natural Gas, Inc.', 'Palmco Power CT, LLC', 'Palmco Power PA, LLC', 'Prier Energy, Inc.', 'Pro Energy Development LLC', 'Pro Energy Marketing', 'Riverstone Holdings-D, L.P.', 'Village of Hilton', 'Wolverine Holdings L.P.']


In [6]:
ITR_datadir = os.path.abspath("../data/external")

Implement an *ad hoc* ingestion pipeline for Steel portfolio.  Later we will ingest steel production data.  We use this only to define the universe, not for actual investment information.

In [7]:
steel_idx = pd.read_csv(
    Path(ITR_datadir, "mdt-steel-portfolio.csv"),
    header=0,
    sep=";",
    usecols=["company_name", "company_lei", "company_id"],
    dtype=str,
    engine="c",
)
# display(steel_idx)

Prepare GLEIF matching data for SEC DERA data.  In the future, such matching will use the ESG Entity-Matching pipeline (https://github.com/os-climate/financial-entity-cleaner/tree/version_0.1.0).

In [8]:
gleif_file = s3_source.Object(
    os.environ["S3_LANDING_BUCKET"], "mtiemann-GLEIF/DERA-matches.csv"
)
gleif_file.download_file("/tmp/dera-gleif.csv")
gleif_df = pd.read_csv("/tmp/dera-gleif.csv", header=0, sep=",", dtype=str, engine="c")
gleif_dict = dict(zip(gleif_df.name, gleif_df.LEI, strict=False))
del gleif_df

Create a very simple entity matcher, cleaning up slight variations in company names between RMI's entity names, the SEC's entity names, and GLEIF's entity names.

Commented out are names we would have to fix if there were SEC data for them.  But because not, we'll never match what's not there in the first place.

In [9]:
# gleif_dict['Basin Electric Power Coop'.upper()] = gleif_dict['BASIN ELECTRIC POWER COOPERATIVE']
# gleif_dict['Big Rivers Electric Corp'.upper()] = gleif_dict['BIG RIVERS ELECTRIC CORPORATION']
# gleif_dict['CHUGACH ELECTRIC ASSOCIATION INC'] = gleif_dict['CHUGACH ELECTRIC ASSN INC.']
gleif_dict["Cleco Partners LP".upper()] = gleif_dict["CLECO CORPORATE HOLDINGS LLC"]
gleif_dict["CONSTELLATION ENERGY CORP"] = "549300F8Y20RYGNGV346"
gleif_dict["FirstEnergy Co".upper()] = gleif_dict["FIRSTENERGY CORP"]
# gleif_dict['Golden Spread Electric Coop., Inc'.upper()] = gleif_dict['GOLDEN SPREAD ELECTRIC COOPERATIVE, INC.']
gleif_dict["MIDWEST ENERGY INC"] = "549300O4B5CVWMKUES27"
gleif_dict["NORTHWESTERN CORP"] = (
    "254900N1WG46G1VMDM34"  # NORTHWESTERN ENERGY GROUP, INC.
)
gleif_dict["NORTHWESTERN ENERGY GROUP, INC."] = "254900N1WG46G1VMDM34"
gleif_dict["OG&E Energy".upper()] = gleif_dict["OGE ENERGY CORP."]
# gleif_dict['Ohio Valley Electric Corp'.upper()] = gleif_dict['OHIO VALLEY ELECTRIC CORPORATION']
gleif_dict["Old Dominion Electric Coop".upper()] = gleif_dict[
    "OLD DOMINION ELECTRIC COOPERATIVE"
]
gleif_dict["PG&E Corp.".upper()] = gleif_dict["PG&E CORP"]
gleif_dict["Reliant Energy Inc".upper()] = gleif_dict["RELIANT HOLDINGS, INC."]
gleif_dict["Sempra".upper()] = gleif_dict["SEMPRA ENERGY"]
gleif_dict["Tri-State Generation & Transmission Association".upper()] = gleif_dict[
    "TRI-STATE GENERATION & TRANSMISSION ASSOCIATION, INC."
]
gleif_dict["GROUP SIMEC SA DE CV"] = "529900LCYCXPA0TZEU09"
gleif_dict["GRUPO SIMEC, S.A.B. DE C.V."] = gleif_dict["GROUP SIMEC SA DE CV"]
gleif_dict["FRIEDMAN INDUSTRIES INC"] = "549300VI5ADYNC8C3G47"
gleif_dict["LOMA NEGRA COMPANIA INDUSTRIAL ARGENTINA SOCIEDAD ANONIMA"] = (
    "529900VKOQQJ8U9DDK92"
)

gleif_1 = {k.split(",")[0].split(" ")[0]: v for k, v in gleif_dict.items()}
gleif_2 = {" ".join(k.split(",")[0].split(" ")[0:2]): v for k, v in gleif_dict.items()}


def gleif_match(x):
    x = x.split(",")[0]
    if x in gleif_dict:
        return gleif_dict[x]
    x = x.replace(".", "")
    if x in gleif_dict:
        return gleif_dict[x]
    x2 = " ".join(x.split(" ")[0:2])
    if x2 in gleif_2:
        return gleif_2[x2]
    if " " not in x and x in gleif_1:
        return gleif_1[x]
    return None

Collect the universe of company names for the sectors we cover.  Steel sector is SIC 3310-3317. Electricity Utilities is SIC 4911 (but also 4931-4932 and 4991).

Some conglomerates have more general SIC codes that hide their activities in sectors of interest.  Others report those SIC codes within reportable segements.
Without more detailed SEC DERA data (available in an S3 bucket but not yet processed as a pipeline), we will not collect the company names we need to collect.

In [10]:
sec_lei_isin = pd.read_sql(
    f"""
select DISTINCT F.name, F.lei, F.sic
from {dera_schema}.financials_by_lei F
where (sic=4911 or sic=4931 or sic=4932 or sic=4991)
      or (sic>=3241 and sic<=3272)
      or (sic>=3310 and sic<=3317)
""",
    engine,
)
sec_lei_isin.lei = sec_lei_isin.name.map(gleif_dict).fillna(sec_lei_isin.lei)

missing_leis = sec_lei_isin[sec_lei_isin.lei.isna()]
sec_lei_isin.dropna(inplace=True)
print("The following companies are missing LEI information and will be dropped:")
display(missing_leis)

The following companies are missing LEI information and will be dropped:


Unnamed: 0,name,lei,sic
24,PREMIER HOLDING CORP.,,4911
28,"CORRELATE INFRASTRUCTURE PARTNERS, INC.",,4931
29,AUSCRETE CORP,,3272
31,AQUA POWER SYSTEMS INC.,,4911
51,808 RENEWABLE ENERGY CORP,,4932
54,SMITH MIDLAND CORP,,3272
56,OSSEN INNOVATION CO. LTD.,,3312
59,ASCENT INDUSTRIES CO.,,3317
65,OCEAN THERMAL ENERGY CORP,,4931
85,"MONTAUK RENEWABLES, INC.",,4932


We create a theoretical portfolio that conveniently contains all available LEI and ISIN information, meaning we don't need to do entity matching or ISIN matching.

Other portfolios may need a lot more work before they can be used to precompute other data.  The code above are samples of the kind of extra data/processing needed for such portfolios.

In [11]:
rmi_idx = rmi_lei_isin.rename(
    columns={
        "parent_name": "company_name",
        "parent_lei": "company_lei",
        "isin": "company_id",
    }
)
# rmi_idx.insert(1, 'company_lei', portfolio_df.company_name.str.upper().map(gleif_match))
# if rmi_idx.company_lei.isna().any():
#     display(rmi_idx[rmi_idx.company_lei.isna()])
rmi_idx.loc[rmi_idx.company_id.isna(), "company_id"] = rmi_idx.apply(
    lambda x: f"ZZ{x.name:010}", axis=1
)

print(f"Number of RMI portfolio copmanies = {len(rmi_idx)}")

Number of RMI portfolio copmanies = 12575


Show list of RMI companies that use made-up LEIs or ISINs

Add Steel company portfolio

In [12]:
portfolio_idx = pd.concat([rmi_idx, steel_idx])
portfolio_idx = portfolio_idx.convert_dtypes()

print(f"Number of total portfolio companies = {len(portfolio_idx)}")

Number of total portfolio companies = 12600


### Company Data

The SIC-to-ISIC table is an open workstream item: https://github.com/os-climate/itr-data-pipeline/issues/1

### Capture a list of the companies for which we have good financial info

We limit our view to the companies in our portfolio.  The user can prioritize whether this is the best source of revenue, market cap, etc., or whether they prefer another source.

Note for future reference: Berkshire Hathaway has one line of business for Energy and another for Steel.  We don't yet have line-of-business info because we use summary data from SEC DERA, not the detailed Notes version of the dataset.

In [13]:
ingest_table = "portfolio_universe"

drop_table = osc._do_sql(
    f"drop table if exists {ingest_schema}.{ingest_table}", engine, verbose=True
)

columnschema = osc.create_table_schema_pairs(portfolio_idx)

tabledef = f"""
create table if not exists {ingest_catalog}.{ingest_schema}.{ingest_table}(
{columnschema}
) with (
    format = 'ORC',
    partitioning = array['bucket(company_lei, 20)']
)
"""
create_table = osc._do_sql(tabledef, engine, verbose=True)
portfolio_idx.to_sql(
    ingest_table,
    con=engine,
    schema=ingest_schema,
    if_exists="append",
    index=False,
    method=osc.TrinoBatchInsert(batch_size=5000, verbose=True),
)

drop table if exists mdt_sandbox.portfolio_universe

create table if not exists osc_datacommons_dev.mdt_sandbox.portfolio_universe(
    company_name varchar,
    company_lei varchar,
    company_id varchar
) with (
    format = 'ORC',
    partitioning = array['bucket(company_lei, 20)']
)

constructed fully qualified table name as: "mdt_sandbox.portfolio_universe"
inserting 5000 records
  ('Dominion Energy', 'ILUL7B6Z54MRYCF6H308', 'US25746U1097')
  ('Just Energy Group Inc.', 'RMI00000000000000001', 'ZZ00000000001')
  ('Consolidated Edison, Inc.', '54930033SBW53OO8T749', 'US2091151041')
  ...
  ('The County of Sonoma', 'RMI00000000000004999', 'ZZ00000004999')
batch insert result: [(5000,)]
inserting 5000 records
  ('Escanaba Operating Services LLC', 'RMI00000000000005000', 'ZZ00000005000')
  ('Chattanooga Metropolitan Airport', 'RMI00000000000005001', 'ZZ00000005001')
  ('DC Water', 'RMI00000000000005002', 'ZZ00000005002')
  ...
  ('ReNew Petra Integrators, LLC', 'RMI00000000000009999',

### Create a list with metric labels embedded in the output for easy reading...

Highlight any rows that have NULL data

In [14]:
qres = cxn.execute(
    text(
        f"""
select F.name, F.lei, T.tname, U2.parent_ticker, F.sic, F.ddate,
       'revenue' as rl, round (F.revenue_usd/1000000.0, 1), round (RT2.fy_revenue_total/1000000.0, 1), round (CS2.fy_revenues/1000000.0, 1), round (F.revenue_usd/RT2.fy_revenue_total, 1), round (F.revenue_usd/CS2.fy_revenues, 1),
       'market_cap' as fl, round (F.market_cap_usd/1000000.0, 1),
       'EV' as el, round ((F.market_cap_usd+F.debt_usd-F.cash_usd)/1000000, 1),
       'assets' as al, round (F.assets_usd/1000000.0, 1), round (AEI2.asset_value/1000000.0, 1), round (F.assets_usd/AEI2.asset_value, 1),
       'cash' as cc, round (F.cash_usd/1000000.0, 1),
       -- 'income' as il, F.income_usd/1000000.0, AEI2.fy_earnings_value/1000000.0, F.income_usd/AEI2.fy_earnings_value,
       'counts: ulei, aei, rt, cs' as legend, c_ulei, c_aei, c_rt, c_cs
from {ingest_schema}.portfolio_universe as P
     join (select count (*) as c_ulei, U.parent_name, U.parent_lei, U.parent_ticker
           from {rmi_schema}.utility_information as U
           group by U.parent_name, U.parent_lei, U.parent_ticker) as U2 on U2.parent_lei=P.company_lei
     join {dera_schema}.financials_by_lei as F on F.lei=P.company_lei
     join (select count (*) as c_cs, CS.parent_name, CS.year, sum(revenues) as fy_revenues
           from {rmi_schema}.customers_sales as CS
           group by CS.parent_name, CS.year) as CS2 on CS2.parent_name=U2.parent_name and CS2.year=year(F.ddate)
     join (select count (*) as c_aei, AEI.parent_name, AEI.year, sum(AEI.asset_value) as asset_value, sum(AEI.earnings_value) as fy_earnings_value
           from {rmi_schema}.assets_earnings_investments as AEI
           group by AEI.parent_name, AEI.year) as AEI2 on AEI2.parent_name=U2.parent_name and AEI2.year=year(F.ddate)
     left join (select count (*) as c_rt, RT.parent_name, RT.year, sum(RT.revenue_total) as fy_revenue_total
           from {rmi_schema}.revenue_by_tech as RT
           group by RT.parent_name, RT.year) as RT2 on RT2.parent_name=U2.parent_name and RT2.year=year(F.ddate)
     left join {dera_schema}.ticker T on F.cik=T.cik and upper(T.tname)=U2.parent_ticker
where year(F.ddate)=2019
order by F.name
"""
    )
)
if qres.returns_rows:
    results = qres.fetchall()
    bad_rows = [x for x in results if any(x) is None]
    if bad_rows:
        print("bad rows:", bad_rows)
        raise ValueError
    else:
        print(len(results), "rows returned, all good")

35 rows returned, all good


### Capture and print a list of companies with financial info

Financial information is part of the "fundamental data" we need for the ITR portfolio companies.  The other part is base year production, emission, and intensity data.  We query the two separately because we have a unified source of truth for the former (SEC DERA) but multiple sources for the latter (RMI for Electric Utilities and MDT for Steel).

### Financial info:
* Company Name, LEI, ISIN, year
* ISIC Code (for Sector)
* Country and Region
* Revenue, Market Cap, Enterprise Value, Assets, Cash

We currently focus exclusively on data from 2019 as our base year

In [15]:
base_financial_sql = f"""
select DISTINCT P.company_name, P.company_lei, P.company_id,
       F.country, UN.region_ar6_10 as region,
       if(S2I.isic in (2410, 3310, 3312) or P.company_name='CLEVELAND-CLIFFS INC', 'Steel', if (S2I.isic>=3241 and S2I.isic<=3272, 'Cement', 'Electricity Utilities')) as sector,
       'equity' as exposure, 'USD' as currency,
       year(F.ddate) as year,
       F.market_cap_usd as company_market_cap,
       F.revenue_usd as company_revenue,
       F.market_cap_usd+F.debt_usd-F.cash_usd as company_ev,
       F.market_cap_usd+F.debt_usd as company_evic,
       F.assets_usd as company_total_assets,
       F.cash_usd as company_cash_equivalents,
       F.debt_usd as company_debt
from {ingest_schema}.portfolio_universe as P
     join {dera_schema}.financials_by_lei as F on F.lei=P.company_lei and year(F.ddate)=2019
     join {iso3166_schema}.countries as I on F.country=I.alpha_2
     join {essd_schema}.{essd_prefix}regions as UN on I.alpha_3=UN.iso
     -- join {dera_schema}.{dera_prefix}sub as S on S.cik=F.cik
     -- left join {rmi_schema}.{rmi_prefix}utility_information_2023 as U on U.parent_lei=P.company_lei
     -- left join {gleif_schema}.gleif_isin_lei G on G.lei=P.lei and G.isin=U.isin
     left join {dera_schema}.sic_isic as S2I on S2I.sic=F.sic
     -- left join {rmi_schema}.{rmi_prefix}operations_emissions_by_fuel as E on U.utility_id_eia=E.utility_id_eia and E.year=year(F.ddate)
-- where E.owned_or_total='owned'
group by P.company_name, P.company_lei, P.company_id,
       F.country, UN.region_ar6_10,
       if(S2I.isic in (2410, 3310, 3312) or P.company_name='CLEVELAND-CLIFFS INC', 'Steel', if (S2I.isic>=3241 and S2I.isic<=3272, 'Cement', 'Electricity Utilities')),
       6, 7, -- exposure, currency
       year(F.ddate),
       F.market_cap_usd, F.revenue_usd, F.market_cap_usd+F.debt_usd-F.cash_usd, F.market_cap_usd+F.debt_usd, F.assets_usd, F.cash_usd, F.debt_usd
order by P.company_name
"""

### Emissions/Production info
* Company Name, LEI, ISIN (join axis with financial info)
* Sector (inferred from RMI data as a source rather than ISIC)
* Production (in whatever units -- we need units in either metadata or a column or as part of the data element iselft)
* S1, S2, S3 emissions (in megametric tons CO2e)
* S1, S2, S3 emissions intensity (emissions / production, in whatever units this resolves to)

We currently focus exclusively on data from 2019 as our base year

Note that RMI data is S1 (own generation) and S3 (purchased generation); we use zero as S2 value

In [16]:
# 'sector', 's1_co2', 's2_co2', 's3_co2', 's1_ei', 's2_ei', 's3_ei', 'production'
rmi_scopes = ["s1", "s2", "s3"]

emissions_sql = f"""
select DISTINCT P.company_name, P.company_lei, P.company_id,
       'Electricity Utilities' as sector, E.year as year,
       sum(if(E.owned_energy_source, E.emissions_co2 + (265/1000000.0)*coalesce(E.emissions_nox, 0), 0)) as ghg_s1,
       0 as ghg_s2,
       sum(if(E.owned_energy_source, 0, E.emissions_co2 + (265/1000000.0)*coalesce(E.emissions_nox, 0))) as ghg_s3,
       sum(if(E.owned_energy_source, E.emissions_co2 + (265/1000000.0)*coalesce(E.emissions_nox, 0), 0)) / sum(if(E.owned_energy_source, E.net_generation, 0)) as ei_s1,
       0 as ei_s2,
       sum(if(E.owned_energy_source, 0, E.emissions_co2 + (265/1000000.0)*coalesce(E.emissions_nox, 0))) / sum(if(E.owned_energy_source, 0, E.net_generation)) as ei_s3,
       sum(E.net_generation) as production
from {ingest_schema}.portfolio_universe as P
     join {rmi_schema}.{rmi_prefix}utility_information_2023 as U on U.parent_lei=P.company_lei
     join {rmi_schema}.{rmi_prefix}operations_emissions_by_fuel as E on U.utility_id_eia=E.utility_id_eia
where E.year>=2014 and E.year<2023
   and P.company_lei!='529900L26LIS2V8PWM23' -- American States Water has negative/zero production values that mess things up
-- and E.owned_or_total='owned'
group by P.company_name, P.company_lei, P.company_id, 3, E.year
order by P.company_name
"""

### `financial_df` contains all the base year (2019) financial, production, and emissions data

For now our benchmark data covers only North America and Europe.  Over time, we expect additional regions (possibly on a per-sector basis).

In [17]:
financial_df = pd.read_sql(
    base_financial_sql,
    engine,
    index_col=["company_name", "company_lei", "company_id", "sector"],
).convert_dtypes()
financial_df.region = financial_df.region.apply(
    lambda x: x if x in ["Asia", "Europe", "North America"] else "Global"
).astype("string")
financial_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,country,region,exposure,currency,year,company_market_cap,company_revenue,company_ev,company_evic,company_total_assets,company_cash_equivalents,company_debt
company_name,company_lei,company_id,sector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,US,North America,equity,USD,2019,10870000000.0,10189000000.0,10102000000.0,11131000000.0,33648000000.0,1029000000.0,261000000.0
Alcoa Corp.,549300T12EZ1F6PWWU29,US0138721065,Electricity Utilities,US,North America,equity,USD,2019,4300000000.0,10433000000.0,5221000000.0,6100000000.0,14631000000.0,879000000.0,1800000000.0
Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,US0158577090,Electricity Utilities,CA,North America,equity,USD,2019,,1624921000.0,,,10911470000.0,62485000.0,6500799000.0
"Allete, Inc.",549300NNLSIMY6Z8OT86,US0185223007,Electricity Utilities,US,North America,equity,USD,2019,4285299935.0,1240500000.0,5829799935.0,5899099935.0,5482800000.0,69300000.0,1613800000.0
Alliant Energy,5493009ML300G373MZ12,US0188021085,Electricity Utilities,US,North America,equity,USD,2019,11600000000.0,3647700000.0,18503600000.0,18519900000.0,16700700000.0,16300000.0,6919900000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WEC Energy Group,549300IGLYTZUK3PVP70,US92939U1060,Electricity Utilities,US,North America,equity,USD,2019,26300000000.0,7523100000.0,38120800000.0,38158300000.0,34951800000.0,37500000.0,11858300000.0
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,Steel,US,North America,equity,USD,2019,1633376617.0,3759556000.0,2294113617.0,2386476617.0,2510796000.0,92363000.0,753100000.0
Walmart Inc.,Y87794H0US1R65VBXU25,US9311421039,Electricity Utilities,US,North America,equity,USD,2019,126810267035.0,514405000000.0,164484267035.0,172206267035.0,219295000000.0,7722000000.0,45396000000.0
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,US,North America,equity,USD,2019,30629347167.0,11529000000.0,50608347167.0,50856347167.0,50448000000.0,248000000.0,20227000000.0


### `emissions_df` contains all the base year (2019) production and emissions data

In [18]:
rmi_emissions_df = (
    pd.read_sql(
        emissions_sql,
        engine,
        index_col=["year", "company_name", "company_lei", "company_id", "sector"],
    )
    .astype("float64")
    .reset_index("year")
)

In [19]:
for scope in rmi_scopes:
    rmi_emissions_df["ghg_" + scope] = rmi_emissions_df["ghg_" + scope].astype(
        "pint[Mt CO2]"
    )
    rmi_emissions_df["ei_" + scope] = rmi_emissions_df["ei_" + scope].astype(
        "pint[Mt CO2/TWh]"
    )
rmi_emissions_df["production"] = rmi_emissions_df["production"].astype("pint[TWh]")
rmi_emissions_df["ghg_s1s2"] = rmi_emissions_df["ghg_s1"] + rmi_emissions_df["ghg_s2"]
rmi_emissions_df["ei_s1s2"] = (
    rmi_emissions_df["ghg_s1s2"] / rmi_emissions_df["production"]
)
rmi_emissions_df["ghg_s1s2s3"] = (
    rmi_emissions_df["ghg_s1s2"] + rmi_emissions_df["ghg_s3"]
)
rmi_emissions_df["ei_s1s2s3"] = (
    rmi_emissions_df["ghg_s1s2s3"] / rmi_emissions_df["production"]
)
template_rmi_df = rmi_emissions_df.pivot(columns="year")

# Put column names into YYYY_metric order (Multi-index has this order inverted)
template_rmi_df.columns = template_rmi_df.columns.map(lambda x: f"{x[1]}_{x[0]}")
template_rmi_df = template_rmi_df.loc[:, ~template_rmi_df.columns.str.contains("_ei_")]
display(template_rmi_df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,2014_ghg_s1,2015_ghg_s1,2016_ghg_s1,2017_ghg_s1,2018_ghg_s1,2019_ghg_s1,2020_ghg_s1,2021_ghg_s1,2014_ghg_s2,2015_ghg_s2,...,2020_ghg_s1s2,2021_ghg_s1s2,2014_ghg_s1s2s3,2015_ghg_s1s2s3,2016_ghg_s1s2s3,2017_ghg_s1s2s3,2018_ghg_s1s2s3,2019_ghg_s1s2s3,2020_ghg_s1s2s3,2021_ghg_s1s2s3
company_name,company_lei,company_id,sector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,36.977487755985116,29.564834483392108,29.408562666634374,19.977555151745413,20.290334522072364,20.49200729256117,17.263204933722637,18.177143675748418,0.0,0.0,...,17.263204933722637,18.177143675748418,48.571252943560964,41.12638722054306,40.84383816361681,32.135931077869095,31.120183650983044,29.627182551675933,24.95834470387006,26.79622227731806
Alberta Investment Management Corp.,549300211OPKUEMQ9F64,ZZ00000010960,Electricity Utilities,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,...,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
Alcoa Corp.,549300T12EZ1F6PWWU29,US0138721065,Electricity Utilities,4.637603152271776,4.6516549108163145,2.8201271436406463,2.1517808248724832,3.6031522505779603,3.6704049297224444,3.985645286116486,4.05842759791932,0.0,0.0,...,3.985645286116486,4.05842759791932,5.099830731577909,5.185576185360823,2.8953268391099796,2.1517808248724832,3.6874406212191566,3.7978925052537966,4.064127421047095,4.10218352886988
Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,US0158577090,Electricity Utilities,2.958116042307808,3.055360728409805,3.4524741020223657,3.71142495761554,3.5132244136592896,3.429365534076304,2.2982828175149916,2.5254019166531996,0.0,0.0,...,2.2982828175149916,2.5254019166531996,4.582467306042801,4.424173503205013,4.566957713596121,4.727247027415712,4.810622254507266,4.505178975871185,3.4037978768860464,4.135326193427748
"Allete, Inc.",549300NNLSIMY6Z8OT86,US0185223007,Electricity Utilities,9.431698097056813,8.480121115065334,8.121633494906257,6.664584684537399,6.726191578793266,4.249347372212299,3.8325463134984714,4.697700796313915,0.0,0.0,...,3.8325463134984714,4.697700796313915,13.671686907236094,12.650368111835146,12.019015745373942,11.446341849277426,11.572221923805639,10.060432911021417,8.686153809771943,9.283692454168078
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Walmart Inc.,Y87794H0US1R65VBXU25,US9311421039,Electricity Utilities,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.9073431385760358,2.1193790304353857,2.1372001746685996,2.133625538267671,2.031364876772328,1.827876549182667,1.5485710295252315,1.5785642188068003
Westfield Gas & Electric Light,549300EHUH3VGBXO8J39,ZZ00000004654,Electricity Utilities,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00618380543742621,0.006579550264310391,0.00560500925742309,0.005784489289026153,0.005266948737002311,0.005382039715383959,0.004718860326715674,0.0046655424561800165
Wolverine Power Supply Coop.,549300ROWOIV5X5MB591,ZZ00000011825,Electricity Utilities,5.174926054261928,3.6130465564968066,4.958073264970476,5.765033255792376,5.794116692598591,5.574060898399985,4.759788834111588,5.044351999925123,0.0,0.0,...,4.759788834111588,5.044351999925123,12.22671924426804,9.946078713875977,11.135181207230616,12.959789717334107,12.29387367711442,11.46735933016727,9.471895499237746,10.178834502122328
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,49.75822408730416,49.25223163785631,45.96985501474195,44.9031849620761,45.41745997571688,41.00170169054415,34.29708429084939,36.41595374820096,0.0,0.0,...,34.29708429084939,36.41595374820096,66.93369267172439,66.14346262777924,59.62625128591675,60.14976048946284,64.15874602909396,58.936242460467824,47.56051248295019,51.31394443610172


### Collect emissions/production info from the MDT Steel data
* Company Name, LEI, ISIN (join axis with financial info)
* Sector (inferred as Steel from source)
* Production (in whatever units -- we need units in either metadata or a column or as part of the data element itself)
* S1, S2, S3 emissions (in whatever units of CO2e)
* S1, S2, S3 emissions intensity (emissions / production, in whatever units this resolves to)

If a company has no emissions or production information, we don't carry it forward as data (even if it does have revenue, earnings, etc.)

In [20]:
steel_wb = pd.read_excel(Path(ITR_datadir, "mdt-steel-demo.xlsx"), sheet_name=None)
steel_production = steel_wb["Steel Fe_tons"].dropna(axis=1, how="all")
steel_production.set_index(steel_production.columns[0:3].to_list(), inplace=True)
steel_production = steel_production.dropna(axis=0, how="all")
steel_production = steel_production.astype("pint[t Steel]")
steel_co2 = {}
steel_ei = {}
for scope in rmi_scopes:
    steel_co2[scope] = steel_wb[f"Steel CO2e {scope.upper()}"].dropna(axis=1, how="all")
    steel_co2[scope].set_index(steel_co2[scope].columns[0:3].to_list(), inplace=True)
    steel_co2[scope] = steel_co2[scope].dropna(axis=0, how="all")
    steel_co2[scope] = steel_co2[scope].astype("pint[t CO2]")
    steel_ei[scope] = (steel_co2[scope] / steel_production).dropna(how="all")

In [21]:
def rename_column_emissions(df, scope):
    df = df.loc[:, 2014:2020]
    df.columns = df.columns.map(lambda x: f"{x}_ghg_{scope}")
    return df


template_steel_co2 = pd.concat(
    [rename_column_emissions(steel_co2[scope], scope) for scope in rmi_scopes], axis=1
)
for year in range(2014, 2021):
    template_steel_co2.insert(
        len(template_steel_co2.columns) - 7,
        f"{year}_ghg_s1s2",
        steel_co2["s1"][year] + steel_co2["s2"][year],
    )
for year in range(2014, 2021):
    template_steel_co2.insert(
        len(template_steel_co2.columns),
        f"{year}_ghg_s1s2s3",
        steel_co2["s1"][year] + steel_co2["s2"][year] + steel_co2["s3"][year],
    )

template_steel_co2.columns

Index(['2014_ghg_s1', '2015_ghg_s1', '2016_ghg_s1', '2017_ghg_s1',
       '2018_ghg_s1', '2019_ghg_s1', '2020_ghg_s1', '2014_ghg_s2',
       '2015_ghg_s2', '2016_ghg_s2', '2017_ghg_s2', '2018_ghg_s2',
       '2019_ghg_s2', '2020_ghg_s2', '2014_ghg_s1s2', '2015_ghg_s1s2',
       '2016_ghg_s1s2', '2017_ghg_s1s2', '2018_ghg_s1s2', '2019_ghg_s1s2',
       '2020_ghg_s1s2', '2014_ghg_s3', '2015_ghg_s3', '2016_ghg_s3',
       '2017_ghg_s3', '2018_ghg_s3', '2019_ghg_s3', '2020_ghg_s3',
       '2014_ghg_s1s2s3', '2015_ghg_s1s2s3', '2016_ghg_s1s2s3',
       '2017_ghg_s1s2s3', '2018_ghg_s1s2s3', '2019_ghg_s1s2s3',
       '2020_ghg_s1s2s3'],
      dtype='object')

In [22]:
template_steel_production = steel_production.loc[:, 2014:2020]
template_steel_production.columns = template_steel_production.columns.map(
    lambda x: f"{x}_production"
)
template_steel_production

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,2014_production,2015_production,2016_production,2017_production,2018_production,2019_production,2020_production
company_name,company_lei,company_id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,6132700.0,7089200.0,6051800.0,5596200.0,5683400.0,5342200.0,5422332.999999999
ARCELORMITTAL,2EULGUTUI56JI9SAL165,LU0140205948,85100000.0,84600000.0,83900000.0,85200000.0,83900000.0,84500000.0,69100000.0
CARPENTER TECHNOLOGY CORP,DX6I6ZD3X5WNNCDJKP85,US1442851036,138831.0,138831.0,138831.0,138831.0,138831.0,119453.92491467576,93262.41134751771
CLEVELAND-CLIFFS INC,549300TM2WLI2BJMDD86,US1858991011,91232700.0,91689200.0,89951800.0,90796200.0,89583400.0,89842200.0,74522333.0
COMMERCIAL METALS CO,549300OQS2LO07ZJ7N73,US2017231034,5301216.0,5301216.0,5301216.0,5301216.0,5301216.0,5301216.0,5543677.0
GERDAU S.A.,254900YDV6SEQQPZVG24,US3737371050,16100000.0,16100000.0,16100000.0,16100000.0,14276549.5,12453099.0,13142354.3
NIPPON STEEL CORP,35380065QWQ4U2V3PA33,JP3381000003,49580000.0,49580000.0,49580000.0,49580000.0,48500000.0,45890000.0,36630000.0
NUCOR CORP,549300GGJCRSI2TIEJ46,US6703461052,22500000.0,22500000.0,22500000.0,22500000.0,22500000.0,20700000.0,20300000.0
POSCO,988400E5HRVX81AYLM04,KR7005490008,41428000.0,42027000.0,42199000.0,37207000.0,37735000.0,38007000.0,35935000.0
STEEL DYNAMICS INC,549300HGGKEL4FYTTQ83,US8581191009,8529969.0,8529969.0,8529969.0,8529969.0,9074135.0,8793160.0,8925057.399999999


In [23]:
template_steel_df = pd.concat([template_steel_co2, template_steel_production], axis=1)
template_steel_df.insert(0, "sector", "Steel")
template_steel_df.set_index(["sector"], append=True, inplace=True)
template_steel_df.insert(0, "emissions_metric", "t CO2")
template_steel_df.insert(1, "production_metric", "t Steel")
template_steel_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,emissions_metric,production_metric,2014_ghg_s1,2015_ghg_s1,2016_ghg_s1,2017_ghg_s1,2018_ghg_s1,2019_ghg_s1,2020_ghg_s1,2014_ghg_s2,...,2018_ghg_s1s2s3,2019_ghg_s1s2s3,2020_ghg_s1s2s3,2014_production,2015_production,2016_production,2017_production,2018_production,2019_production,2020_production
company_name,company_lei,company_id,sector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
CARPENTER TECHNOLOGY CORP,DX6I6ZD3X5WNNCDJKP85,US1442851036,Steel,t CO2,t Steel,374910.254,374910.254,374910.254,374910.254,298055.0,299000.0,236000.0,660000.0,...,,,,138831.0,138831.0,138831.0,138831.0,138831.0,119453.92491467576,93262.41134751771
CLEVELAND-CLIFFS INC,549300TM2WLI2BJMDD86,US1858991011,Steel,t CO2,t Steel,35098923.07076,32771887.7758,33209464.625,32357763.7366,31034981.66376,30349904.4497999,25607731.879518665,4494608.671214038,...,37898202.67945568,36970711.38976185,31130878.442268662,91232700.0,91689200.0,89951800.0,90796200.0,89583400.0,89842200.0,74522333.0
COMMERCIAL METALS CO,549300OQS2LO07ZJ7N73,US2017231034,Steel,t CO2,t Steel,1048006.0,1048006.0,1048006.0,1048006.0,1048006.0,1048006.0,1106156.0,2548437.0,...,,,,5301216.0,5301216.0,5301216.0,5301216.0,5301216.0,5301216.0,5543677.0
GERDAU S.A.,254900YDV6SEQQPZVG24,US3737371050,Steel,t CO2,t Steel,12075000.0,12075000.0,12075000.0,12075000.0,10707412.125,9056519.0,9198407.0,4025000.0,...,,,,16100000.0,16100000.0,16100000.0,16100000.0,14276549.5,12453099.0,13142354.3
NIPPON STEEL CORP,35380065QWQ4U2V3PA33,JP3381000003,Steel,t CO2,t Steel,80501000.0,80501000.0,80501000.0,80501000.0,81099000.0,78384000.0,62860000.0,12478000.0,...,114853000.0,111199000.0,91784000.0,49580000.0,49580000.0,49580000.0,49580000.0,48500000.0,45890000.0,36630000.0
NUCOR CORP,549300GGJCRSI2TIEJ46,US6703461052,Steel,t CO2,t Steel,4800000.0,4800000.0,4800000.0,4800000.0,4800000.0,4400000.0,4700000.0,5785714.285714285,...,18143161.09422492,16727659.574468086,17500000.0,22500000.0,22500000.0,22500000.0,22500000.0,22500000.0,20700000.0,20300000.0
POSCO,988400E5HRVX81AYLM04,KR7005490008,Steel,t CO2,t Steel,84412800.0,82741300.0,81309800.0,75633360.0,77391479.0,79447924.0,75069656.0,4741000.0,...,97401443.0,93402890.0,87600882.0,41428000.0,42027000.0,42199000.0,37207000.0,37735000.0,38007000.0,35935000.0
STEEL DYNAMICS INC,549300HGGKEL4FYTTQ83,US8581191009,Steel,t CO2,t Steel,3215942.0,3215942.0,3215942.0,3215942.0,3299883.0,3145097.0,3063829.9454545453,1700245.0,...,,,,8529969.0,8529969.0,8529969.0,8529969.0,9074135.0,8793160.0,8925057.399999999
TENARIS SA,549300Y7C05BKC4HZB40,US88031M1099,Steel,t CO2,t Steel,2000000.0,2000000.0,2000000.0,2000000.0,2000000.0,1800000.0,1100000.0,1000000.0,...,6200000.0,4900000.0,2800000.0,2900000.0,2900000.0,2900000.0,2900000.0,2900000.0,2900000.0,1800000.0
TERNIUM S.A.,529900QG4KU23TEI2E46,US8808901081,Steel,t CO2,t Steel,17744560.0,17744560.0,17744560.0,17744560.0,17744560.0,16682357.0,15257923.0,858941.0,...,19659711.0,18733673.0,17179700.0,10953432.098765433,10953432.098765433,10953432.098765433,10953432.098765433,10953432.098765433,10297751.2345679,9418470.98765432


In [24]:
pd.options.display.max_rows = 99
pd.options.display.max_columns = 49
template_df = (
    pd.concat([financial_df, pd.concat([template_steel_df, template_rmi_df])], axis=1)
    .dropna(thresh=16)
    .drop(columns=["company_cash_equivalents", "company_debt"], axis=1)
)
template_df.loc[
    pd.IndexSlice[:, :, :, ["Electricity Utilities"]],
    ["emissions_metric", "production_metric"],
] = [
    "Mt CO2",
    "TWh",
]
template_df = template_df.reset_index()
cols = template_df.columns.tolist()

In [25]:
cols = cols[:3] + cols[4:6] + [cols[3]] + cols[6:]
template_df = template_df[cols]
for col in cols:
    if col.startswith("2021_"):
        col_index = template_df.columns.get_loc(col)
        for year in [2022]:
            newcol = col.replace("2021", str(year))
            newvals = template_df[
                "emissions_metric" if "_ghg_" in newcol else "production_metric"
            ].map(lambda x: Q_(np.nan, x))
            template_df.insert(col_index + 1, newcol, newvals)
display(template_df.sample(15, random_state=0))
pd.reset_option("display.max_rows")
pd.reset_option("display.max_columns")

Unnamed: 0,company_name,company_lei,company_id,country,region,sector,exposure,currency,year,company_market_cap,company_revenue,company_ev,company_evic,company_total_assets,emissions_metric,production_metric,2014_ghg_s1,2015_ghg_s1,2016_ghg_s1,2017_ghg_s1,2018_ghg_s1,2019_ghg_s1,2020_ghg_s1,2014_ghg_s2,...,2016_ghg_s1s2s3,2017_ghg_s1s2s3,2018_ghg_s1s2s3,2019_ghg_s1s2s3,2020_ghg_s1s2s3,2014_production,2015_production,2016_production,2017_production,2018_production,2019_production,2020_production,2021_ghg_s1,2022_ghg_s1,2021_ghg_s2,2022_ghg_s2,2021_ghg_s3,2022_ghg_s3,2021_production,2022_production,2021_ghg_s1s2,2022_ghg_s1s2,2021_ghg_s1s2s3,2022_ghg_s1s2s3
18,CMS Energy Corp.,549300IA9XFBAGNIBW29,US1258961002,US,North America,Electricity Utilities,equity,USD,2019.0,16352000000.0,6845000000.0,28163000000.0,28303000000.0,26837000000.0,Mt CO2,TWh,18.177669716019107 CO2 * megametric_ton,19.427180505429664 CO2 * megametric_ton,16.14414499288306 CO2 * megametric_ton,14.289806719879884 CO2 * megametric_ton,14.513100615787744 CO2 * megametric_ton,14.844826496647247 CO2 * megametric_ton,12.90304881923237 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,24.385167717768674 CO2 * megametric_ton,22.616963209434147 CO2 * megametric_ton,24.208775787864106 CO2 * megametric_ton,23.410207714152264 CO2 * megametric_ton,21.654329216633055 CO2 * megametric_ton,43.36078495373647 terawatt_hour,46.03069160054101 terawatt_hour,46.86379958905408 terawatt_hour,45.803073459704926 terawatt_hour,48.928961740923484 terawatt_hour,49.496822308654565 terawatt_hour,48.81712357519342 terawatt_hour,15.17566103939086,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,8.149314842005083,nan CO2 * megametric_ton,49.764548652692845,nan terawatt_hour,15.17566103939086,nan CO2 * megametric_ton,23.324975881395943,nan CO2 * megametric_ton
169,Orlando Utilities Commision,549300EJR7JVMRXL5D66,ZZ00000004895,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,3.7784299748949755 CO2 * megametric_ton,6.073025058184463 CO2 * megametric_ton,4.15389660929843 CO2 * megametric_ton,4.4912260318457955 CO2 * megametric_ton,4.555435472519305 CO2 * megametric_ton,4.291729561930882 CO2 * megametric_ton,3.829840028026398 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,4.816871983325711 CO2 * megametric_ton,5.060608975886327 CO2 * megametric_ton,5.124006436154483 CO2 * megametric_ton,4.8604069010112765 CO2 * megametric_ton,4.332880555137898 CO2 * megametric_ton,7.445165773479881 terawatt_hour,7.153767729384876 terawatt_hour,7.862031936286599 terawatt_hour,7.766830045448359 terawatt_hour,7.934510676730146 terawatt_hour,7.735373646549086 terawatt_hour,7.579494703874107 terawatt_hour,4.108419971041707,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.5650615353168269,nan CO2 * megametric_ton,7.979950076691649,nan terawatt_hour,4.108419971041707,nan CO2 * megametric_ton,4.673481506358534,nan CO2 * megametric_ton
106,Basin Electric Power Coop.,5493002CLOJ5KYT5GB16,ZZ00000006557,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,17.67876373883121 CO2 * megametric_ton,19.936606676316732 CO2 * megametric_ton,18.86420522572634 CO2 * megametric_ton,19.233686850142135 CO2 * megametric_ton,19.389832659977635 CO2 * megametric_ton,18.101269719395113 CO2 * megametric_ton,17.02153247242067 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,23.43885992860362 CO2 * megametric_ton,24.735267424408207 CO2 * megametric_ton,24.49018520275841 CO2 * megametric_ton,23.762794405064334 CO2 * megametric_ton,22.542691812400673 CO2 * megametric_ton,29.851906855104556 terawatt_hour,31.45853820316792 terawatt_hour,30.848216904645973 terawatt_hour,32.92137979869034 terawatt_hour,32.9630367248235 terawatt_hour,32.68732917188635 terawatt_hour,32.17137385740698 terawatt_hour,16.083751878030313,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,6.751354282799905,nan CO2 * megametric_ton,33.33991995232204,nan terawatt_hour,16.083751878030313,nan CO2 * megametric_ton,22.835106160830215,nan CO2 * megametric_ton
92,Walmart Inc.,Y87794H0US1R65VBXU25,US9311421039,US,North America,Electricity Utilities,equity,USD,2019.0,126810267035.0,514405000000.0,164484267035.0,172206267035.0,219295000000.0,Mt CO2,TWh,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,2.1372001746685996 CO2 * megametric_ton,2.133625538267671 CO2 * megametric_ton,2.031364876772328 CO2 * megametric_ton,1.827876549182667 CO2 * megametric_ton,1.5485710295252315 CO2 * megametric_ton,4.215471999999999 terawatt_hour,4.489331999999999 terawatt_hour,5.056061999999999 terawatt_hour,5.031891000000002 terawatt_hour,4.962949999999999 terawatt_hour,4.706187 terawatt_hour,4.421713 terawatt_hour,0.0,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,1.5785642188068003,nan CO2 * megametric_ton,4.516571,nan terawatt_hour,0.0,nan CO2 * megametric_ton,1.5785642188068003,nan CO2 * megametric_ton
176,PUD No 1 of Cowlitz County,TS0L3NF45PNBQM207L23,ZZ00000005628,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,2.1348994155895538 CO2 * megametric_ton,2.0655986382452145 CO2 * megametric_ton,1.984545599092852 CO2 * megametric_ton,1.7848975227855208 CO2 * megametric_ton,1.5332030385799587 CO2 * megametric_ton,5.268150936363637 terawatt_hour,5.094207130987291 terawatt_hour,5.302975403714564 terawatt_hour,5.144531283254516 terawatt_hour,5.1492539902545165 terawatt_hour,4.942655938980074 terawatt_hour,4.75593366225137 terawatt_hour,0.0,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,1.5848581051316857,nan CO2 * megametric_ton,4.878705049513586,nan terawatt_hour,0.0,nan CO2 * megametric_ton,1.5848581051316857,nan CO2 * megametric_ton
183,Siemens Aktiengesellschaft,W38RGI023J3WT1HWRP32,DE0007236101,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,0.7914836990647104 CO2 * megametric_ton,nan CO2 * megametric_ton,...,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,nan CO2 * megametric_ton,0.7914836990647104 CO2 * megametric_ton,nan terawatt_hour,nan terawatt_hour,nan terawatt_hour,nan terawatt_hour,nan terawatt_hour,nan terawatt_hour,3.93897309 terawatt_hour,2.271491276591783,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,6.37236019,nan terawatt_hour,2.271491276591783,nan CO2 * megametric_ton,2.271491276591783,nan CO2 * megametric_ton
5,Alphabet Inc.,5493006MHB84DD0ZWV18,US02079K1079,US,North America,Electricity Utilities,equity,USD,2019.0,663000000000.0,161857000000.0,654716000000.0,673214000000.0,275909000000.0,Mt CO2,TWh,0.0427813428272884 CO2 * megametric_ton,0.0660013746622876 CO2 * megametric_ton,0.0683544613464463 CO2 * megametric_ton,0.0645142611696245 CO2 * megametric_ton,0.0704398315815089 CO2 * megametric_ton,0.0651133124800371 CO2 * megametric_ton,0.0764629219741582 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,0.0683544613464463 CO2 * megametric_ton,0.0645142611696245 CO2 * megametric_ton,0.0704398315815089 CO2 * megametric_ton,0.0651133124800371 CO2 * megametric_ton,0.0764629219741582 CO2 * megametric_ton,0.2898219999999997 terawatt_hour,0.6531219999999995 terawatt_hour,0.7030389999999997 terawatt_hour,0.7201379999999998 terawatt_hour,0.7958559999999992 terawatt_hour,0.7722139999999998 terawatt_hour,0.8563009999999999 terawatt_hour,0.0722963132576992,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.7397159999999995,nan terawatt_hour,0.0722963132576992,nan CO2 * megametric_ton,0.0722963132576992,nan CO2 * megametric_ton
139,Gainesville Regional Utilities,549300QVH6UAIPUV5M94,ZZ00000006866,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,1.0191690974608192 CO2 * megametric_ton,1.1260292333997264 CO2 * megametric_ton,1.0872308978650205 CO2 * megametric_ton,0.9542227478011488 CO2 * megametric_ton,1.2896240484370025 CO2 * megametric_ton,1.0403437008829244 CO2 * megametric_ton,0.9546922232940308 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,1.3042850927090073 CO2 * megametric_ton,1.2323985069743377 CO2 * megametric_ton,1.3501518004015023 CO2 * megametric_ton,1.1069676586355697 CO2 * megametric_ton,1.021703261954694 CO2 * megametric_ton,2.1473220029999993 terawatt_hour,2.2994419999999995 terawatt_hour,2.296131 terawatt_hour,2.327001999999999 terawatt_hour,2.1568889999999987 terawatt_hour,2.080059999999999 terawatt_hour,2.0665439999999995 terawatt_hour,0.9640624683156808,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.0534449089991189,nan CO2 * megametric_ton,2.0910029999999997,nan terawatt_hour,0.9640624683156808,nan CO2 * megametric_ton,1.0175073773147998,nan CO2 * megametric_ton
12,"Berkshire Hathaway, Inc.",5493000C01ZX7D35SD85,US0846707026,US,North America,Electricity Utilities,equity,USD,2019.0,417300000000.0,254616000000.0,,421014902807.7753,817729000000.0,Mt CO2,TWh,77.55273072460989 CO2 * megametric_ton,73.07114055308234 CO2 * megametric_ton,64.04279695657775 CO2 * megametric_ton,63.33968896640881 CO2 * megametric_ton,63.088786470129214 CO2 * megametric_ton,52.301989706117176 CO2 * megametric_ton,54.33954424694405 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,73.82670244513578 CO2 * megametric_ton,74.09706413661381 CO2 * megametric_ton,71.91834020566681 CO2 * megametric_ton,61.20049339326833 CO2 * megametric_ton,63.18888047627185 CO2 * megametric_ton,174.05285573995002 terawatt_hour,180.3072135474682 terawatt_hour,161.8463329147028 terawatt_hour,163.310821769237 terawatt_hour,163.4962010735802 terawatt_hour,145.37942481317472 terawatt_hour,158.28062252397845 terawatt_hour,59.89214172465692,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,8.801292821807161,nan CO2 * megametric_ton,167.3129569500017,nan terawatt_hour,59.89214172465692,nan CO2 * megametric_ton,68.69343454646408,nan CO2 * megametric_ton
160,Municipal Electric Authority Of Georgia,JA0WNILDDF2KUPS83B16,ZZ00000006820,,,Electricity Utilities,,,,,,,,,Mt CO2,TWh,4.146501399626316 CO2 * megametric_ton,4.219719697334429 CO2 * megametric_ton,4.185913654771107 CO2 * megametric_ton,3.6002514128237917 CO2 * megametric_ton,3.4838345418290233 CO2 * megametric_ton,3.556128345814166 CO2 * megametric_ton,2.063653909398291 CO2 * megametric_ton,0.0 CO2 * megametric_ton,...,4.561105619473492 CO2 * megametric_ton,4.013219086678739 CO2 * megametric_ton,4.00092587043941 CO2 * megametric_ton,4.011096550505103 CO2 * megametric_ton,2.4237592676555177 CO2 * megametric_ton,13.316010571465593 terawatt_hour,13.710466406780345 terawatt_hour,13.786633609177992 terawatt_hour,13.060415015621775 terawatt_hour,12.98885990099217 terawatt_hour,13.111187558871125 terawatt_hour,11.682195628975364 terawatt_hour,2.392716186525119,nan CO2 * megametric_ton,0.0,nan CO2 * megametric_ton,0.4376840827962405,nan CO2 * megametric_ton,12.411924600041091,nan terawatt_hour,2.392716186525119,nan CO2 * megametric_ton,2.8304002693213595,nan CO2 * megametric_ton


In [26]:
with pd.ExcelWriter(
    "../data/processed/template-20220415-output.xlsx", datetime_format="YYYY"
) as writer:
    template_df.to_excel(writer, sheet_name="ITR input data", index=False)

### Load emissions target data

The RMI power plant data is valid for Scope 1 emissions only.

In [27]:
osc._do_sql(
    f"describe {rmi_schema}.{rmi_prefix}emissions_targets", engine, verbose=False
)

[('parent_name', 'varchar', '', ''),
 ('utility_name', 'varchar', '', ''),
 ('respondent_id', 'integer', '', ''),
 ('year', 'integer', '', ''),
 ('target_scope', 'varchar', '', ''),
 ('target_type', 'varchar', '', ''),
 ('state', 'varchar', '', ''),
 ('co2_historical', 'double', '', ''),
 ('co2_target', 'double', '', ''),
 ('co2_target_all_years', 'double', '', ''),
 ('co2_1point5c', 'double', '', ''),
 ('generation_historical', 'double', '', ''),
 ('generation_projected', 'double', '', ''),
 ('generation_1point5c', 'double', '', ''),
 ('co2_intensity_historical', 'double', '', ''),
 ('co2_intensity_target', 'double', '', ''),
 ('co2_intensity_target_all_years', 'double', '', ''),
 ('co2_intensity_1point5c', 'double', '', '')]

### `targets_df` has all the historical and target emissions data
### `trajectory_df` is derived from historical target emissions data

We also preserve RMI's 1.5 degree target info, which can be presented as a trajectory to compare/contrast corporate targets with RMI's best policy recommendations
* rtg_df is the RMI contribution to targets_df (RMI data frame)
* mtg_df is the Steel contribution to targets_df (MDT data frame)

We do not consider targets/emissions for WIRES ONLY utilities (who have no generation of their own).

We set the LEI information based on our hand-curated GLEIF table, not the LEI info in the RMI and SEC data tables

In [28]:
# Emissions targets are now segregated by states, but we care more about rolling them up to the company level.
# Therefore we sum absolutes (emissions and generation) and re-compute intensities based on the aggregated amounts.

rtg_df = pd.read_sql(
    f"""
select ET.parent_name as company_name, ET.utility_name, 'Electricity Utilities' as sector, ET.year as year,
       target_scope,
       sum(co2_target) as co2_target,
       sum(co2_historical) as co2_historical,
       sum(co2_target_all_years) as co2_target_all_years,
       sum(co2_1point5C) as co2_1point5C,
       sum(generation_historical) as production_historical,
       sum(generation_projected) as production_projected,
       sum(generation_1point5C) as production_1point5C
from {rmi_schema}.{rmi_prefix}emissions_targets ET
     join (select utility_name, year
           from {rmi_schema}.{rmi_prefix}operations_emissions_by_tech
           where technology_eia!='Batteries' and technology_eia!='Hydroelectric Pumped Storage'
           group by utility_name, year) EM
           on ET.utility_name=EM.utility_name and ((ET.year>2020 and EM.year=2020) or (ET.year=EM.year) or ((ET.year<2005 and EM.year=2005) ))
     -- join (select parent_name, parent_lei from {rmi_schema}.{rmi_prefix}utility_information_2023 group by parent_name, parent_lei) U
     --       on ET.parent_name=U.parent_name
     -- join {dera_schema}.financials_by_lei as F on F.lei=U.parent_lei
where ET.target_type='All'
group by ET.parent_name, ET.utility_name, ET.year, ET.target_scope
order by company_name, year
""",
    engine,
)  # parse_dates=['year']

In [29]:
# We set the LEI information based on our hand-curated GLEIF table, not the LEI info in the RMI and SEC data tables
rtg_df.insert(1, "company_lei", rtg_df.company_name.str.upper().map(gleif_match))
rtg_df.insert(2, "company_id", rtg_df.company_lei.map(rmi_lei_dict))
rtg_df = rtg_df[
    rtg_df.company_lei != "529900L26LIS2V8PWM23"
]  # American States Water has negative/zero production values that mess things up

rtg_df.loc[rtg_df.production_historical.gt(0), "ei_historical"] = (
    rtg_df.co2_historical / rtg_df.production_historical
)
rtg_df["production_general"] = (
    rtg_df[["production_historical", "production_projected"]].bfill(axis=1).iloc[:, 0]
)
rtg_df.loc[rtg_df.production_general.gt(0), "ei_target"] = (
    rtg_df.co2_target / rtg_df.production_general
)
rtg_df.loc[rtg_df.production_general.gt(0), "ei_target_all_years"] = (
    rtg_df.co2_target_all_years / rtg_df.production_general
)
rtg_df.loc[rtg_df.production_1point5C.gt(0), "ei_1point5C"] = (
    rtg_df.co2_1point5C / rtg_df.production_1point5C
)
rtg_df.drop(columns="production_general", inplace=True)

In [30]:
for col in rtg_df.columns:
    if col.startswith("co2_"):
        rtg_df[col] = rtg_df[col].astype("pint[Mt CO2]")
    elif col.startswith("production_"):
        rtg_df[col] = rtg_df[col].astype("pint[TWh]")
    elif col.startswith("ei_"):
        rtg_df[col] = rtg_df[col].astype("pint[Mt CO2/TWh]")
rtg_df = rtg_df.convert_dtypes()
print(rtg_df.dtypes)
print(f"len(rtg_df) = {len(rtg_df)}")

company_name                                         string[python]
company_lei                                          string[python]
company_id                                           string[python]
utility_name                                         string[python]
sector                                               string[python]
year                                                          Int64
target_scope                                         string[python]
co2_target                               pint[CO2 * megametric_ton]
co2_historical                           pint[CO2 * megametric_ton]
co2_target_all_years                     pint[CO2 * megametric_ton]
co2_1point5C                             pint[CO2 * megametric_ton]
production_historical                           pint[terawatt_hour]
production_projected                            pint[terawatt_hour]
production_1point5C                             pint[terawatt_hour]
ei_historical            pint[CO2 * megametric_t

The RMI targets only cover S1 and S3, so we don't need to compute the non-existent S2 (until they do provide such).

In [31]:
def compute_sums_and_wavg(x):
    zero_Mt_CO2 = Q_(0.0, "Mt CO2")
    d = {
        "co2_s1_by_year": x[x.target_scope == "Scope 1"]["co2_target_all_years"].sum(),
        "co2_s2_by_year": zero_Mt_CO2,
        "co2_s3_by_year": x[x.target_scope == "Scope 3"]["co2_target_all_years"].sum(),
        "production_by_year": x[["production_historical", "production_projected"]]
        .bfill(axis=1)
        .iloc[:, 0]
        .sum(),
    }
    return pd.Series(
        d,
        index=[
            "co2_s1_by_year",
            "co2_s2_by_year",
            "co2_s3_by_year",
            "production_by_year",
        ],
    )


with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    rmi_targets_df = (
        rtg_df[rtg_df.year >= 2014]
        .groupby(
            ["company_name", "company_lei", "company_id", "sector", "year"]
        )  # grouping automagically sets index
        .apply(compute_sums_and_wavg)
        .sort_values(["company_name", "year"], ascending=[True, True])
    )
m = rmi_targets_df.production_by_year != 0

In [32]:
rmi_targets_df.loc[~m, "ei_s1_by_year"] = m.map(lambda x: Q_(np.nan, "Mt CO2/TWh"))
rmi_targets_df.loc[~m, "ei_s2_by_year"] = m.map(lambda x: Q_(np.nan, "Mt CO2/TWh"))
rmi_targets_df.loc[~m, "ei_s1s2_by_year"] = m.map(lambda x: Q_(np.nan, "Mt CO2/TWh"))
rmi_targets_df.loc[~m, "ei_s3_by_year"] = m.map(lambda x: Q_(np.nan, "Mt CO2/TWh"))
rmi_targets_df.loc[~m, "ei_s1s2s3_by_year"] = m.map(lambda x: Q_(np.nan, "Mt CO2/TWh"))
rmi_targets_df.loc[m, "ei_s1_by_year"] = (
    rmi_targets_df.co2_s1_by_year / rmi_targets_df.production_by_year
)
rmi_targets_df.loc[m, "ei_s2_by_year"] = (
    rmi_targets_df.co2_s2_by_year / rmi_targets_df.production_by_year
)
rmi_targets_df.loc[m, "ei_s1s2_by_year"] = (
    rmi_targets_df.co2_s1_by_year + rmi_targets_df.co2_s2_by_year
) / rmi_targets_df.production_by_year
rmi_targets_df.loc[m, "ei_s3_by_year"] = (
    rmi_targets_df.co2_s3_by_year / rmi_targets_df.production_by_year
)
rmi_targets_df.loc[m, "ei_s1s2s3_by_year"] = (
    rmi_targets_df.co2_s1_by_year
    + rmi_targets_df.co2_s2_by_year
    + rmi_targets_df.co2_s3_by_year
) / rmi_targets_df.production_by_year

In [33]:
# Exelon doesn't own any generation, so it has no Scope 1 emissions

rmi_targets_df.loc["Exelon Corp.", :, :, :]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,co2_s1_by_year,co2_s2_by_year,co2_s3_by_year,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_lei,company_id,sector,year,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2014,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,39.06149614347343 CO2 * megametric_ton,73.31853699999999 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.5327642604689921 CO2 * megametric_ton / tera...,0.5327642604689921 CO2 * megametric_ton / tera...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2015,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,42.96308083862668 CO2 * megametric_ton,75.567596 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.5685384094874036 CO2 * megametric_ton / tera...,0.5685384094874036 CO2 * megametric_ton / tera...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2016,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,36.42383317597144 CO2 * megametric_ton,78.89776799999999 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.46165860073470577 CO2 * megametric_ton / ter...,0.46165860073470577 CO2 * megametric_ton / ter...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2017,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,35.17266935965131 CO2 * megametric_ton,76.79278799999999 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.4580204766058411 CO2 * megametric_ton / tera...,0.4580204766058411 CO2 * megametric_ton / tera...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2018,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,36.49442797128381 CO2 * megametric_ton,82.48666799999998 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.44242819907919934 CO2 * megametric_ton / ter...,0.44242819907919934 CO2 * megametric_ton / ter...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2019,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,30.84340414290968 CO2 * megametric_ton,82.542234 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.3736681532378889 CO2 * megametric_ton / tera...,0.3736681532378889 CO2 * megametric_ton / tera...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2020,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,30.39421714285133 CO2 * megametric_ton,81.342795 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.3736559229720509 CO2 * megametric_ton / tera...,0.3736559229720509 CO2 * megametric_ton / tera...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2021,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,58.932877171544305 CO2 * megametric_ton,162.68559 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.36225013642292664 CO2 * megametric_ton / ter...,0.36225013642292664 CO2 * megametric_ton / ter...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2022,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,28.53866002869297 CO2 * megametric_ton,81.342795 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.35084434987380225 CO2 * megametric_ton / ter...,0.35084434987380225 CO2 * megametric_ton / ter...
3SOUA6IRML7435B56G12,US30161N1019,Electricity Utilities,2023,0.0 CO2 * megametric_ton,0.0 CO2 * megametric_ton,27.61088147161379 CO2 * megametric_ton,81.342795 terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.33943856332467787 CO2 * megametric_ton / ter...,0.33943856332467787 CO2 * megametric_ton / ter...


In [34]:
steel_production.iloc[0:2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,...,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050
company_name,company_lei,company_id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,6132700.0,7089200.0,6051800.0,5596200.0,5683400.0,5342200.0,5422332.999999999,5503667.994999998,5586223.014924997,5670016.360148872,...,7412642.793455035,7523832.435356861,7636689.921887212,7751240.27071552,7867508.874776252,7985521.507897895,8105304.330516361,8226883.895474106,8350287.153906216,8475541.461214809
ARCELORMITTAL,2EULGUTUI56JI9SAL165,LU0140205948,85100000.0,84600000.0,83900000.0,85200000.0,83900000.0,84500000.0,69100000.0,62900000.0,63843499.99999999,64801152.49999999,...,84717179.91199829,85987937.61067826,87277756.67483841,88586923.02496098,89915726.87033537,91264462.7733904,92633429.71499124,94022931.1607161,95433275.1281268,96864774.25504872


In [35]:
mdt_production = (
    steel_production.melt(
        var_name="year", value_name="production_by_year", ignore_index=False
    )
    .dropna()
    .set_index(["year"], append=True)
)
# display(mdt_production)
mdt_co2 = pd.concat(
    [
        steel_co2[scope]
        .melt(var_name="year", value_name=f"co2_{scope}_by_year", ignore_index=False)
        .dropna()
        .set_index(["year"], append=True)
        for scope in rmi_scopes
    ],
    join="outer",
    axis=1,
)
# display(mdt_co2)
mdt_ei = pd.concat(
    [
        steel_ei[scope]
        .melt(var_name="year", value_name=f"ei_{scope}_by_year", ignore_index=False)
        .dropna()
        .set_index(["year"], append=True)
        for scope in rmi_scopes
    ],
    join="outer",
    axis=1,
)
# display(mdt_ei)

In [36]:
steel_targets_df = pd.concat([mdt_production, mdt_co2, mdt_ei], join="outer", axis=1)
steel_targets_df.insert(2, "sector", "Steel")
steel_targets_df.set_index(["sector"], append=True, inplace=True)
steel_targets_df = steel_targets_df.reorder_levels(
    order=["company_name", "company_lei", "company_id", "sector", "year"]
)
steel_targets_df["ei_s1s2_by_year"] = (
    steel_targets_df.co2_s1_by_year + steel_targets_df.co2_s2_by_year
) / steel_targets_df.production_by_year
steel_targets_df["ei_s1s2s3_by_year"] = (
    steel_targets_df.co2_s1_by_year
    + steel_targets_df.co2_s2_by_year
    + steel_targets_df.co2_s3_by_year
) / steel_targets_df.production_by_year
targets_df = pd.concat([rmi_targets_df, steel_targets_df])
emissions_df = targets_df[["co2_s1_by_year", "co2_s2_by_year", "co2_s3_by_year"]]
targets_df = targets_df[
    [
        "production_by_year",
        "ei_s1_by_year",
        "ei_s2_by_year",
        "ei_s1s2_by_year",
        "ei_s3_by_year",
        "ei_s1s2s3_by_year",
    ]
]

In [37]:
targets_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2014,36.19461958615737 terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.1251801131912772 CO2 * megametric_ton / tera...,0.886533552080827 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2015,31.44311084393575 terawatt_hour,0.7022036041071335 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.7022036041071335 CO2 * megametric_ton / tera...,0.1657609724940575 CO2 * megametric_ton / tera...,0.867964576601191 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2016,30.373284382535495 terawatt_hour,0.6829725520701145 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.6829725520701145 CO2 * megametric_ton / tera...,0.1725469505065893 CO2 * megametric_ton / tera...,0.855519502576704 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2017,18.967821914705045 terawatt_hour,0.5517859527636777 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.5517859527636777 CO2 * megametric_ton / tera...,0.26094421220836866 CO2 * megametric_ton / ter...,0.8127301649720464 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2018,20.87552203581955 terawatt_hour,0.5211819040195862 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.5211819040195862 CO2 * megametric_ton / tera...,0.2069258217605436 CO2 * megametric_ton / tera...,0.7281077257801298 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...,...
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2046,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2047,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2048,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2049,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe


In [38]:
targets_df.loc["WORTHINGTON INDUSTRIES INC"]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_lei,company_id,sector,year,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2014,3282000.0 Fe * metric_ton,0.040174588665447895 CO2 / Fe,0.05978062157221207 CO2 / Fe,0.09995521023765996 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2015,3510000.0 Fe * metric_ton,0.03756495726495727 CO2 / Fe,0.0558974358974359 CO2 / Fe,0.09346239316239316 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2016,3523000.0 Fe * metric_ton,0.03587822878228782 CO2 / Fe,0.05445359068975305 CO2 / Fe,0.09033181947204087 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2017,4070000.0 Fe * metric_ton,0.03266437346437347 CO2 / Fe,0.043394840294840295 CO2 / Fe,0.07605921375921376 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2018,3820000.0 Fe * metric_ton,0.0366369109947644 CO2 / Fe,0.046062303664921464 CO2 / Fe,0.08269921465968587 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2019,3715000.0 Fe * metric_ton,0.03613916554508748 CO2 / Fe,0.04328371467025572 CO2 / Fe,0.0794228802153432 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2020,3830000.0 Fe * metric_ton,0.03407467362924282 CO2 / Fe,0.03634490861618799 CO2 / Fe,0.07041958224543081 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2021,4067000.0 Fe * metric_ton,0.03197717482173592 CO2 / Fe,0.03383279075485616 CO2 / Fe,0.06580996557659208 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2022,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe
1WRCIANKYOIK6KYE5E82,US9818111026,Steel,2023,nan Fe * metric_ton,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe


In [39]:
targets_df.unstack(level="year")["ei_s1_by_year"].sample(15).sort_index(
    level=["company_name"], ascending=[1]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,year,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,...,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050
company_name,company_lei,company_id,sector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Avista Corp.,Q0IK63NITJD6RJ47SW96,US05379B1070,Electricity Utilities,0.1332592936977594 CO2 * megametric_ton / tera...,0.19217446355445073 CO2 * megametric_ton / ter...,0.1747206751121883 CO2 * megametric_ton / tera...,0.17010422530639044 CO2 * megametric_ton / ter...,0.16219434601047744 CO2 * megametric_ton / ter...,0.18990076314011445 CO2 * megametric_ton / ter...,0.1638073874954286 CO2 * megametric_ton / tera...,0.14006130705894165 CO2 * megametric_ton / ter...,0.11643091825366798 CO2 * megametric_ton / ter...,0.09291580990925741 CO2 * megametric_ton / ter...,...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour
CLEVELAND-CLIFFS INC,549300TM2WLI2BJMDD86,US1858991011,Steel,0.384718670726176 CO2 / Fe,0.35742364177896635 CO2 / Fe,0.36919177409457066 CO2 / Fe,0.35637795124245286 CO2 / Fe,0.34643674680532327 CO2 / Fe,0.33781346015346797 CO2 / Fe,0.3436249356218983 CO2 / Fe,0.3724038742143742 CO2 / Fe,0.3649712106536525 CO2 / Fe,0.3576768991361501 CO2 / Fe,...,0.11853648406571919 CO2 / Fe,0.10380863410243607 CO2 / Fe,0.08949020181244491 CO2 / Fe,0.07557230272690633 CO2 / Fe,0.06204622555575234 CO2 / Fe,0.04890342900945997 CO2 / Fe,0.03613553867694087 CO2 / Fe,0.023734343958581854 CO2 / Fe,0.011691795053488602 CO2 / Fe,0.0 CO2 / Fe
CMS Energy Corp.,549300IA9XFBAGNIBW29,US1258961002,Electricity Utilities,0.43570844498331984 CO2 * megametric_ton / ter...,0.4414023783109466 CO2 * megametric_ton / tera...,0.3500608807289515 CO2 * megametric_ton / tera...,0.3146165065053435 CO2 * megametric_ton / tera...,0.2965829857486904 CO2 * megametric_ton / tera...,0.3038877805815029 CO2 * megametric_ton / tera...,0.26296826257071493 CO2 * megametric_ton / ter...,0.2493869739778797 CO2 * megametric_ton / tera...,0.23585196181144655 CO2 * megametric_ton / ter...,0.22236310647720536 CO2 * megametric_ton / ter...,...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour
"Consolidated Edison, Inc.",54930033SBW53OO8T749,US2091151041,Electricity Utilities,0.0698826823845419 CO2 * megametric_ton / tera...,0.07290919835911609 CO2 * megametric_ton / ter...,0.07619228527163943 CO2 * megametric_ton / ter...,0.07625708332885266 CO2 * megametric_ton / ter...,0.07182255255617352 CO2 * megametric_ton / ter...,0.06638858729703817 CO2 * megametric_ton / ter...,0.0748282916983819 CO2 * megametric_ton / tera...,0.0710868714288003 CO2 * megametric_ton / tera...,0.06734545714307395 CO2 * megametric_ton / ter...,0.06360404285734762 CO2 * megametric_ton / ter...,...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour
Edison International,549300I7ROF15MAEVP56,US2810201077,Electricity Utilities,0.06322972522372573 CO2 * megametric_ton / ter...,0.036273503457639446 CO2 * megametric_ton / te...,0.031698478789318606 CO2 * megametric_ton / te...,0.026322703048176112 CO2 * megametric_ton / te...,0.014571001415030383 CO2 * megametric_ton / te...,0.02378128079315014 CO2 * megametric_ton / ter...,0.024106236089818676 CO2 * megametric_ton / te...,0.023578085695327246 CO2 * megametric_ton / te...,0.02304175492406259 CO2 * megametric_ton / ter...,0.0224971577262481 CO2 * megametric_ton / tera...,...,0.005294767706514935 CO2 * megametric_ton / te...,0.0039977425242062985 CO2 * megametric_ton / t...,0.002683058894820913 CO2 * megametric_ton / te...,0.0013505381454950354 CO2 * megametric_ton / t...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour
Entergy Corp.,4XM3TW50JULSLG8BNC79,US29364G1031,Electricity Utilities,0.1774501147195159 CO2 * megametric_ton / tera...,0.17377310004430963 CO2 * megametric_ton / ter...,0.2101029575332888 CO2 * megametric_ton / tera...,0.20777247501437673 CO2 * megametric_ton / ter...,0.22061848477958795 CO2 * megametric_ton / ter...,0.2010448790827469 CO2 * megametric_ton / tera...,0.21695286798970911 CO2 * megametric_ton / ter...,0.20315541080684288 CO2 * megametric_ton / ter...,0.18940906363025461 CO2 * megametric_ton / ter...,0.17571378975164526 CO2 * megametric_ton / ter...,...,0.03767067277720645 CO2 * megametric_ton / ter...,0.03341960105950063 CO2 * megametric_ton / ter...,0.02918492459340988 CO2 * megametric_ton / ter...,0.02496662865390368 CO2 * megametric_ton / ter...,0.02076469836093766 CO2 * megametric_ton / ter...,0.016579118679913584 CO2 * megametric_ton / te...,0.012409874422143469 CO2 * megametric_ton / te...,0.008256950245318142 CO2 * megametric_ton / te...,0.004120330653980004 CO2 * megametric_ton / te...,0.0 CO2 * megametric_ton / terawatt_hour
NIPPON STEEL CORP,35380065QWQ4U2V3PA33,JP3381000003,Steel,1.623658733360226 CO2 / Fe,1.623658733360226 CO2 / Fe,1.623658733360226 CO2 / Fe,1.623658733360226 CO2 / Fe,1.6721443298969072 CO2 / Fe,1.7080845500108957 CO2 / Fe,1.716079716079716 CO2 / Fe,1.4703823674798024 CO2 / Fe,1.28283058150118 CO2 / Fe,1.2505080142244909 CO2 / Fe,...,0.39823629783667636 CO2 / Fe,0.34875647320124925 CO2 / Fe,0.3006521320700425 CO2 / Fe,0.2538934260971506 CO2 / Fe,0.2084510887497132 CO2 / Fe,0.16429642463031585 CO2 / Fe,0.12140129898791813 CO2 / Fe,0.07973812741406774 CO2 / Fe,0.03927986572121564 CO2 / Fe,0.0 CO2 / Fe
NUCOR CORP,549300GGJCRSI2TIEJ46,US6703461052,Steel,0.21333333333333335 CO2 / Fe,0.21333333333333335 CO2 / Fe,0.21333333333333335 CO2 / Fe,0.21333333333333335 CO2 / Fe,0.21333333333333335 CO2 / Fe,0.21256038647342995 CO2 / Fe,0.2315270935960591 CO2 / Fe,0.22041285312172992 CO2 / Fe,0.20962468502803272 CO2 / Fe,0.19908352785958486 CO2 / Fe,...,0.050592271939924646 CO2 / Fe,0.04430631368575778 CO2 / Fe,0.0381950980049636 CO2 / Fe,0.03225483293874478 CO2 / Fe,0.0264818004423192 CO2 / Fe,0.020872355028428928 CO2 / Fe,0.015422922434799702 CO2 / Fe,0.01012999831513938 CO2 / Fe,0.00499014695327063 CO2 / Fe,0.0 CO2 / Fe
"NextEra Energy, Inc.",5493008F4ZOQFNG3WN54,US65341B1061,Electricity Utilities,0.4014904917267108 CO2 * megametric_ton / tera...,0.33571724908492845 CO2 * megametric_ton / ter...,0.3138766871379128 CO2 * megametric_ton / tera...,0.32877427289547073 CO2 * megametric_ton / ter...,0.31849142808903647 CO2 * megametric_ton / ter...,0.3022947957469948 CO2 * megametric_ton / tera...,0.2888554347089691 CO2 * megametric_ton / tera...,0.2513025393226922 CO2 * megametric_ton / tera...,0.21374964393641527 CO2 * megametric_ton / ter...,0.17619674855013828 CO2 * megametric_ton / ter...,...,0.016174553244413484 CO2 * megametric_ton / te...,0.012130914933310116 CO2 * megametric_ton / te...,0.008087276622206742 CO2 * megametric_ton / te...,0.004043638311103373 CO2 * megametric_ton / te...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour
"PNM Resources, Inc.",5493003JOBJGLZSDDQ28,US69349H1077,Electricity Utilities,0.5169363912950559 CO2 * megametric_ton / tera...,0.49125064184105943 CO2 * megametric_ton / ter...,0.48636438972661206 CO2 * megametric_ton / ter...,0.4861061416236354 CO2 * megametric_ton / tera...,0.41532872139055715 CO2 * megametric_ton / ter...,0.4023381981777952 CO2 * megametric_ton / tera...,0.396093132910287 CO2 * megametric_ton / teraw...,0.37525356592842374 CO2 * megametric_ton / ter...,0.3545256333953931 CO2 * megametric_ton / tera...,0.3339088788871631 CO2 * megametric_ton / tera...,...,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour,0.0 CO2 * megametric_ton / terawatt_hour


In [40]:
traj_df = {}
traj_mdf = {}
traj_udf = targets_df.unstack(level="year")
for scope in ["s1", "s2", "s3", "s1s2", "s1s2s3"]:
    # We start by copying the target data, but we will use only the historic and replace the projection
    traj_df[scope] = traj_udf[f"ei_{scope}_by_year"].copy()
    # By calculating 2014-2019, we miss the anomoly of 2020
    historic_progress = (
        (traj_df[scope][2019] / traj_df[scope][2014]).dropna().map(lambda x: x.m)
    )

    # There are wierd artifacts where energy storage systems have negative generation, so treat their progress as zero
    # If intensity is actually growing, cap trajectory at 1 (no progress).
    annualized_progress = historic_progress.where(historic_progress >= 0, 0).where(
        historic_progress <= 1, 1
    ) ** (1 / (2019 - 2014))

    for year in range(2020, 2051):
        traj_df[scope].loc[:, year] = traj_df[scope][2020] * annualized_progress ** (
            year - 2020
        )
    traj_mdf[scope] = (
        traj_df[scope]
        .melt(var_name="year", value_name=f"ei_{scope}_by_year", ignore_index=False)
        .set_index("year", append=True)
        .convert_dtypes()
    )

traj_mdf = pd.concat([*traj_mdf.values()], join="outer", axis=1)
traj_mdf.loc[targets_df.index.intersection(traj_mdf.index), "production_by_year"] = (
    targets_df["production_by_year"]
)
display(traj_mdf.loc["CLEVELAND-CLIFFS INC"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,ei_s1_by_year,ei_s2_by_year,ei_s3_by_year,ei_s1s2_by_year,ei_s1s2s3_by_year,production_by_year
company_lei,company_id,sector,year,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
549300TM2WLI2BJMDD86,US1858991011,Steel,2014,0.384718670726176 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.05753680954394641 CO2 / Fe,0.4339839963299785 CO2 / Fe,0.491520805873925 CO2 / Fe,91232700.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2015,0.35742364177896635 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.03807310110241991 CO2 / Fe,0.4066889673827689 CO2 / Fe,0.4447620684851888 CO2 / Fe,91689200.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2016,0.36919177409457066 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.021501243506855894 CO2 / Fe,0.41845709969837325 CO2 / Fe,0.43995834320522914 CO2 / Fe,89951800.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2017,0.35637795124245286 CO2 / Fe,0.04926532560380255 CO2 / Fe,0.026981028214837183 CO2 / Fe,0.40564327684625545 CO2 / Fe,0.4326243050610926 CO2 / Fe,90796200.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2018,0.34643674680532327 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.027347317092229144 CO2 / Fe,0.39570207240912586 CO2 / Fe,0.42304938950135496 CO2 / Fe,89583400.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2019,0.33781346015346797 CO2 / Fe,0.04926532560380256 CO2 / Fe,0.024428405626754465 CO2 / Fe,0.3870787857572705 CO2 / Fe,0.411507191384025 CO2 / Fe,89842200.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2020,0.3436249356218983 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.02484865258780881 CO2 / Fe,0.3928902612257008 CO2 / Fe,0.41773891381350964 CO2 / Fe,74522333.0 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2021,0.3348045942178267 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.020935951978650145 CO2 / Fe,0.3840045718009933 CO2 / Fe,0.4031549883374006 CO2 / Fe,68403667.995 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2022,0.32621065786885856 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.017639350210376784 CO2 / Fe,0.3753198430117214 CO2 / Fe,0.38908021074112653 CO2 / Fe,69429723.01492499 Fe * metric_ton
549300TM2WLI2BJMDD86,US1858991011,Steel,2023,0.3178373150937111 CO2 / Fe,0.049265325603802555 CO2 / Fe,0.014861835571729315 CO2 / Fe,0.36683152988955847 CO2 / Fe,0.3754968058677884 CO2 / Fe,70471168.86014886 Fe * metric_ton


In [41]:
traj_mdf[
    [
        "ei_s1_by_year",
        "ei_s2_by_year",
        "ei_s1s2_by_year",
        "ei_s3_by_year",
        "ei_s1s2s3_by_year",
    ]
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2014,0.7613534388895496 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.1251801131912772 CO2 * megametric_ton / tera...,0.886533552080827 CO2 * megametric_ton / teraw...
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,Electricity Utilities,2014,0.5963291711197718 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.5963291711197718 CO2 * megametric_ton / tera...,0.2640188998177555 CO2 * megametric_ton / tera...,0.8603480709375273 CO2 * megametric_ton / tera...
Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,US0158577090,Electricity Utilities,2014,0.4263900524373595 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.4263900524373595 CO2 * megametric_ton / tera...,0.20168922226929425 CO2 * megametric_ton / ter...,0.6280792747066537 CO2 * megametric_ton / tera...
Alliant Energy,5493009ML300G373MZ12,US0188021085,Electricity Utilities,2014,0.44024620047880486 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.44024620047880486 CO2 * megametric_ton / ter...,0.22535921837357425 CO2 * megametric_ton / ter...,0.665605418852379 CO2 * megametric_ton / teraw...
Ameren Corp.,XRZQ5S7HYJFPHJ78L959,US0236081024,Electricity Utilities,2014,0.6153521569974539 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.6153521569974539 CO2 * megametric_ton / tera...,0.11113043309375356 CO2 * megametric_ton / ter...,0.7264825900912075 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...
TENARIS SA,549300Y7C05BKC4HZB40,US88031M1099,Steel,2050,0.3247695000000001 CO2 / Fe,0.058254222222222256 CO2 / Fe,0.35312731595793334 CO2 / Fe,0.09957217492370148 CO2 / Fe,0.3790634137373297 CO2 / Fe
TERNIUM S.A.,529900QG4KU23TEI2E46,US8808901081,Steel,2050,1.62 CO2 / Fe,0.12253698095081486 CO2 / Fe,1.7425369809508149 CO2 / Fe,0.04837506398555913 CO2 / Fe,1.8240434166563824 CO2 / Fe
TIMKENSTEEL CORP,549300QZTZWHDE9HJL14,US8873991033,Steel,2050,0.0672506309264205 CO2 / Fe,0.21952049174308824 CO2 / Fe,0.3 CO2 / Fe,0.031196681671330283 CO2 / Fe,0.22609248274988833 CO2 / Fe
UNITED STATES STEEL CORP,JNLUVFYJT1OZSIQ24U47,US9129091081,Steel,2050,2.123050259965338 CO2 / Fe,0.1733102253032929 CO2 / Fe,2.296360485268631 CO2 / Fe,nan CO2 / Fe,nan CO2 / Fe


In [42]:
# df = traj_mdf[['ei_s1_by_year','ei_s2_by_year','ei_s1s2_by_year','ei_s3_by_year','ei_s1s2s2_by_year']].multiply(traj_mdf['production_by_year'], axis='index')
# df.rename(columns={f"ei_{scope}_by_year":f"co2_{scope}_by_year" for scope in scopes}, inplace=True)
trajectories_df = traj_mdf
# trajectories_df = pd.concat([df, traj_mdf], axis=1)
trajectories_df = trajectories_df[
    [trajectories_df.columns[-1]] + list(trajectories_df.columns[0:-1])
]
trajectories_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s3_by_year,ei_s1s2_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2014,36.19461958615737 terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.1251801131912772 CO2 * megametric_ton / tera...,0.7613534388895496 CO2 * megametric_ton / tera...,0.886533552080827 CO2 * megametric_ton / teraw...
"ALLETE, Inc.",549300NNLSIMY6Z8OT86,US0185223007,Electricity Utilities,2014,15.816261477442415 terawatt_hour,0.5963291711197718 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.2640188998177555 CO2 * megametric_ton / tera...,0.5963291711197718 CO2 * megametric_ton / tera...,0.8603480709375273 CO2 * megametric_ton / tera...
Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,US0158577090,Electricity Utilities,2014,6.623989945455636 terawatt_hour,0.4263900524373595 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.20168922226929425 CO2 * megametric_ton / ter...,0.4263900524373595 CO2 * megametric_ton / tera...,0.6280792747066537 CO2 * megametric_ton / tera...
Alliant Energy,5493009ML300G373MZ12,US0188021085,Electricity Utilities,2014,30.214415710059214 terawatt_hour,0.44024620047880486 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.22535921837357425 CO2 * megametric_ton / ter...,0.44024620047880486 CO2 * megametric_ton / ter...,0.665605418852379 CO2 * megametric_ton / teraw...
Ameren Corp.,XRZQ5S7HYJFPHJ78L959,US0236081024,Electricity Utilities,2014,55.480899019 terawatt_hour,0.6153521569974539 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.11113043309375356 CO2 * megametric_ton / ter...,0.6153521569974539 CO2 * megametric_ton / tera...,0.7264825900912075 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...,...
TENARIS SA,549300Y7C05BKC4HZB40,US88031M1099,Steel,2050,4600926.629014815 Fe * metric_ton,0.3247695000000001 CO2 / Fe,0.058254222222222256 CO2 / Fe,0.09957217492370148 CO2 / Fe,0.35312731595793334 CO2 / Fe,0.3790634137373297 CO2 / Fe
TERNIUM S.A.,529900QG4KU23TEI2E46,US8808901081,Steel,2050,16337654.439342635 Fe * metric_ton,1.62 CO2 / Fe,0.12253698095081486 CO2 / Fe,0.04837506398555913 CO2 / Fe,1.7425369809508149 CO2 / Fe,1.8240434166563824 CO2 / Fe
TIMKENSTEEL CORP,549300QZTZWHDE9HJL14,US8873991033,Steel,2050,2136902.3560039606 Fe * metric_ton,0.0672506309264205 CO2 / Fe,0.21952049174308824 CO2 / Fe,0.031196681671330283 CO2 / Fe,0.3 CO2 / Fe,0.22609248274988833 CO2 / Fe
UNITED STATES STEEL CORP,JNLUVFYJT1OZSIQ24U47,US9129091081,Steel,2050,23601879.430786483 Fe * metric_ton,2.123050259965338 CO2 / Fe,0.1733102253032929 CO2 / Fe,nan CO2 / Fe,2.296360485268631 CO2 / Fe,nan CO2 / Fe


In [43]:
targets_df.sort_index(
    level=["company_name", "company_lei", "company_id", "sector", "year"],
    ascending=[1, 1, 1, 1, 1],
    inplace=True,
)
trajectories_df.sort_index(
    level=["company_name", "company_lei", "company_id", "sector", "year"],
    ascending=[1, 1, 1, 1, 1],
    inplace=True,
)

In [44]:
targets_df.loc[(slice(None), slice(None), slice(None), slice(None), slice(2019, 2024))]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2019,21.198346493669995 terawatt_hour,0.547696121523109 CO2 * megametric_ton / teraw...,0.0 CO2 * megametric_ton / terawatt_hour,0.547696121523109 CO2 * megametric_ton / teraw...,0.15279819789575205 CO2 * megametric_ton / ter...,0.7004943194188611 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2020,19.916025510319997 terawatt_hour,0.47502170564392526 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.47502170564392526 CO2 * megametric_ton / ter...,0.15406538789099775 CO2 * megametric_ton / ter...,0.629087093534923 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2021,40.0466948077595 terawatt_hour,0.4468217621762177 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.4468217621762177 CO2 * megametric_ton / tera...,0.14557764128663767 CO2 * megametric_ton / ter...,0.5923994034628554 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2022,20.13143132218833 terawatt_hour,0.41853009216299253 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.41853009216299253 CO2 * megametric_ton / ter...,0.13717520288293272 CO2 * megametric_ton / ter...,0.5557052950459253 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2023,20.24028267590053 terawatt_hour,0.3901471852397458 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.3901471852397458 CO2 * megametric_ton / tera...,0.12885762061394965 CO2 * megametric_ton / ter...,0.5190048058536955 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...,...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2020,123.14256571308636 terawatt_hour,0.2785152647441116 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.2785152647441116 CO2 * megametric_ton / tera...,0.10576665855368178 CO2 * megametric_ton / ter...,0.3842819232977934 CO2 * megametric_ton / tera...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2021,247.94255667108877 terawatt_hour,0.2582947535231334 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.2582947535231334 CO2 * megametric_ton / tera...,0.10010030072088673 CO2 * megametric_ton / ter...,0.3583950542440202 CO2 * megametric_ton / tera...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2022,124.80775757856199 terawatt_hour,0.23832795212013563 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.23832795212013563 CO2 * megametric_ton / ter...,0.09450331463296094 CO2 * megametric_ton / ter...,0.33283126675309654 CO2 * megametric_ton / ter...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2023,125.65209041324857 terawatt_hour,0.2186133447747759 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.2186133447747759 CO2 * megametric_ton / tera...,0.08897529111734365 CO2 * megametric_ton / ter...,0.30758863589211954 CO2 * megametric_ton / ter...


In [45]:
trajectories_df.loc[
    (slice(None), slice(None), slice(None), slice(None), slice(2019, 2024))
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s3_by_year,ei_s1s2_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2019,21.198346493669995 terawatt_hour,0.547696121523109 CO2 * megametric_ton / teraw...,0.0 CO2 * megametric_ton / terawatt_hour,0.15279819789575205 CO2 * megametric_ton / ter...,0.547696121523109 CO2 * megametric_ton / teraw...,0.7004943194188611 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2020,19.916025510319997 terawatt_hour,0.47502170564392526 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.15406538789099775 CO2 * megametric_ton / ter...,0.47502170564392526 CO2 * megametric_ton / ter...,0.629087093534923 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2021,40.0466948077595 terawatt_hour,0.44473788365872813 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.15406538789099775 CO2 * megametric_ton / ter...,0.44473788365872813 CO2 * megametric_ton / ter...,0.6001401236507244 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2022,20.13143132218833 terawatt_hour,0.4163847310790225 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.15406538789099775 CO2 * megametric_ton / ter...,0.4163847310790225 CO2 * megametric_ton / tera...,0.5725251268336704 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2023,20.24028267590053 terawatt_hour,0.38983916290070536 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.15406538789099775 CO2 * megametric_ton / ter...,0.38983916290070536 CO2 * megametric_ton / ter...,0.5461808133439816 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...,...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2020,123.14256571308636 terawatt_hour,0.2785152647441116 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.10576665855368178 CO2 * megametric_ton / ter...,0.2785152647441116 CO2 * megametric_ton / tera...,0.3842819232977934 CO2 * megametric_ton / tera...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2021,247.94255667108877 terawatt_hour,0.2605596142357657 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.103562283796719 CO2 * megametric_ton / teraw...,0.2605596142357657 CO2 * megametric_ton / tera...,0.3641041405651871 CO2 * megametric_ton / tera...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2022,124.80775757856199 terawatt_hour,0.2437615497774127 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.10140385232789241 CO2 * megametric_ton / ter...,0.2437615497774127 CO2 * megametric_ton / tera...,0.3449858479915513 CO2 * megametric_ton / tera...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2023,125.65209041324857 terawatt_hour,0.2280464427465744 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.09929040660324626 CO2 * megametric_ton / ter...,0.2280464427465744 CO2 * megametric_ton / tera...,0.32687141412263593 CO2 * megametric_ton / ter...


### TODO: Implement Units

Intensity and Production data need Units to distinguish TWh of generation vs. Tons of Steel production

Company data is converted to USD by SEC_DERA ingestion for now, but should support any currencies in the future

In [46]:
# If DF_COL contains Pint quantities (because it is a PintArray or an array of Pint Quantities),
# return a two-column dataframe of magnitudes and units.
# If DF_COL contains no Pint quanities, return it unchanged.


def dequantify_column(df_col: pd.Series) -> pd.DataFrame:
    if isinstance(df_col.values, PintArray):
        return pd.DataFrame(
            {
                df_col.name: df_col.values.quantity.m,
                df_col.name + "_units": str(df_col.values.dtype.units),
            },
            index=df_col.index,
        )
    elif df_col.size == 0:
        return df_col
    elif df_col.map(lambda x: isinstance(x, Quantity)).any():
        # Process mixed quantity columns - extract values and units
        return pd.DataFrame(
            {
                df_col.name: df_col.map(
                    lambda x: x.m if isinstance(x, Quantity) else x
                ),
                df_col.name + "_units": df_col.map(
                    lambda x: str(x.u) if isinstance(x, Quantity) else None
                ),
            },
            index=df_col.index,
        )
    else:
        return df_col


# Rewrite dataframe DF so that columns containing Pint quantities are represented by a column for the Magnitude and column for the Units.
# The magnitude column retains the original column name and the units column is renamed with a _units suffix.
def dequantify_df(df: pd.DataFrame) -> pd.DataFrame:
    return pd.concat([dequantify_column(df[col]) for col in df.columns], axis=1)

In [47]:
# Because this DF comes from reading a Trino table, and because columns must be unqiue, we don't have to enumerate to ensure we properly handle columns with duplicated names


def requantify_df(df: pd.DataFrame) -> pd.DataFrame:
    units_col = None
    columns_reversed = reversed(df.columns)
    for col in columns_reversed:
        if col.endswith("_units"):
            if units_col:
                # We expect _units column to follow a non-units column
                raise ValueError
            units_col = col
            continue
        if units_col:
            if col + "_units" != units_col:
                raise ValueError
            if (df[units_col] == df[units_col][0]).all():
                # Make a PintArray
                new_col = PintArray(df[col], dtype=f"pint[{ureg(df[units_col][0]).u}]")
            else:
                # Make a pd.Series of Quantity in a way that does not throw UnitStrippedWarning
                new_col = pd.Series(data=df[col], name=col) * pd.Series(
                    data=df[units_col].map(lambda x: ureg(x).u), name=col
                )
            df = df.drop(columns=units_col)
            df[col] = new_col
            units_col = None
    return df

In [48]:
targets_to_sql = dequantify_df(targets_df.drop(columns="production_by_year"))
targets_to_sql.loc[:, :, :, "Steel"]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,ei_s1_by_year,ei_s1_by_year_units,ei_s2_by_year,ei_s2_by_year_units,ei_s1s2_by_year,ei_s1s2_by_year_units,ei_s3_by_year,ei_s3_by_year_units,ei_s1s2s3_by_year,ei_s1s2s3_by_year_units
company_name,company_lei,company_id,year,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,2014,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,2015,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,2016,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,2017,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
AK STEEL HOLDING CORP,529900DT4E7ZNETMVC04,US0015471081,2018,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
...,...,...,...,...,...,...,...,...,...,...,...,...,...
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,2046,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,2047,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,2048,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe
WORTHINGTON INDUSTRIES INC,1WRCIANKYOIK6KYE5E82,US9818111026,2049,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe,,CO2 / Fe


In [49]:
financial_df[financial_df.company_market_cap.isnull()]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,country,region,exposure,currency,year,company_market_cap,company_revenue,company_ev,company_evic,company_total_assets,company_cash_equivalents,company_debt
company_name,company_lei,company_id,sector,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,US0158577090,Electricity Utilities,CA,North America,equity,USD,2019,,1624921000.0,,,10911470000.0,62485000.0,6500799000.0
AltaGas Ltd.,549300D7A8QA85Z2MH11,CA0213611001,Electricity Utilities,CA,North America,equity,USD,2019,,4217079301.548375,,,15191078477.615889,43820787.646663,5257496846.427197
Brookfield Asset Management,C6J3FGIWG6MBDGTE8F80,CA1125851040,Electricity Utilities,CA,North America,equity,USD,2019,,67826000000.0,,,323969000000.0,6778000000.0,
Brookfield Renewable Partners LP,VA8DFMRI2GY8Y7V79H93,BMG162581083,Electricity Utilities,BM,North America,equity,USD,2019,,2980000000.0,,,35691000000.0,115000000.0,
Cleco Corp.,5493002H80P81B3HXL31,ZZ00000000320,Electricity Utilities,US,North America,equity,USD,2019,,1639605000.0,,,7476298000.0,116292000.0,400000000.0
"Emera, Inc.",NQZVQT2P5IUF2PGA1Q48,CA2908761018,Electricity Utilities,CA,North America,equity,USD,2019,,4689821949.365263,,,24436804207.44374,170371538.661281,10497803051.1156
"Fortis, Inc.",549300MQYQ9Y065XPR71,CA3495531079,Electricity Utilities,CA,North America,equity,USD,2019,,6740419928.207348,,,40984331759.76149,283952564.435468,16500713751.154068
GERDAU S.A.,254900YDV6SEQQPZVG24,US3737371050,Steel,BR,Global,equity,USD,2019,,9277358887.952824,,,12637594776.748106,618190583.169522,
GROUP SIMEC SA DE CV,529900LCYCXPA0TZEU09,MXP4984U1083,Steel,MX,Global,equity,USD,2019,,1810692140.166067,,,2332290813.325633,394578553.29882,
MECHEL PAO,253400C9GSPBSKERRP65,US5838406081,Steel,RU,Global,equity,USD,2019,,4790625095.912008,,,5048081194.461899,56682987.188579,


In [50]:
tablenames = (
    "company_data",
    "target_data",
    "trajectory_data",
    "production_data",
    "emissions_data",
)

dataframes = [
    financial_df.loc[
        financial_df.index.intersection(targets_df.reset_index("year").index)
    ]
    .reset_index()
    .convert_dtypes(),
    dequantify_df(targets_df.drop(columns="production_by_year"))
    .reset_index()
    .convert_dtypes(),
    dequantify_df(trajectories_df.drop(columns="production_by_year"))
    .reset_index()
    .convert_dtypes(),
    dequantify_df(targets_df[["production_by_year"]]).reset_index().convert_dtypes(),
    dequantify_df(emissions_df).reset_index().convert_dtypes(),
]

for ingest_table, df in zip(tablenames, dataframes, strict=False):
    drop_table = osc._do_sql(
        f"drop table if exists {demo_schema}.{itr_prefix}{ingest_table}",
        engine,
        verbose=True,
    )

    columnschema = osc.create_table_schema_pairs(df)

    tabledef = f"""
create table if not exists {ingest_catalog}.{demo_schema}.{itr_prefix}{ingest_table}(
{columnschema}
) with (
    format = 'ORC',
    partitioning = array['year']
)
"""

    qres = osc._do_sql(tabledef, engine, verbose=True)
    df.to_sql(
        f"{itr_prefix}{ingest_table}",
        con=engine,
        schema=demo_schema,
        if_exists="append",
        index=False,
        method=osc.TrinoBatchInsert(batch_size=1200, verbose=True),
    )

drop table if exists demo_dv.itr_company_data

create table if not exists osc_datacommons_dev.demo_dv.itr_company_data(
    company_name varchar,
    company_lei varchar,
    company_id varchar,
    sector varchar,
    country varchar,
    region varchar,
    exposure varchar,
    currency varchar,
    year bigint,
    company_market_cap double,
    company_revenue double,
    company_ev double,
    company_evic double,
    company_total_assets double,
    company_cash_equivalents double,
    company_debt double
) with (
    format = 'ORC',
    partitioning = array['year']
)

constructed fully qualified table name as: "demo_dv.itr_company_data"
inserting 48 records
  ('AES Corp.', '2NUNNB7D43COUIRE5295', 'US00130H1059', 'Electricity Utilities', 'US', 'North America', 'equity', 'USD', 2019, 10870000000.0, 10189000000.0, 10102000000.0, 11131000000.0, 33648000000.0, 1029000000.0, 261000000.0)
  ('Algonquin Power & Utilities Corp.', '549300K5VIUTJXQL7X75', 'US0158577090', 'Electricity Util

In [51]:
targets_df.index.names

FrozenList(['company_name', 'company_lei', 'company_id', 'sector', 'year'])

In [52]:
targets_df.columns

Index(['production_by_year', 'ei_s1_by_year', 'ei_s2_by_year',
       'ei_s1s2_by_year', 'ei_s3_by_year', 'ei_s1s2s3_by_year'],
      dtype='object')

In [53]:
targets_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,production_by_year,ei_s1_by_year,ei_s2_by_year,ei_s1s2_by_year,ei_s3_by_year,ei_s1s2s3_by_year
company_name,company_lei,company_id,sector,year,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2014,36.19461958615737 terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.7613534388895496 CO2 * megametric_ton / tera...,0.1251801131912772 CO2 * megametric_ton / tera...,0.886533552080827 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2015,31.44311084393575 terawatt_hour,0.7022036041071335 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.7022036041071335 CO2 * megametric_ton / tera...,0.1657609724940575 CO2 * megametric_ton / tera...,0.867964576601191 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2016,30.373284382535495 terawatt_hour,0.6829725520701145 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.6829725520701145 CO2 * megametric_ton / tera...,0.1725469505065893 CO2 * megametric_ton / tera...,0.855519502576704 CO2 * megametric_ton / teraw...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2017,18.967821914705045 terawatt_hour,0.5517859527636777 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.5517859527636777 CO2 * megametric_ton / tera...,0.26094421220836866 CO2 * megametric_ton / ter...,0.8127301649720464 CO2 * megametric_ton / tera...
AES Corp.,2NUNNB7D43COUIRE5295,US00130H1059,Electricity Utilities,2018,20.87552203581955 terawatt_hour,0.5211819040195862 CO2 * megametric_ton / tera...,0.0 CO2 * megametric_ton / terawatt_hour,0.5211819040195862 CO2 * megametric_ton / tera...,0.2069258217605436 CO2 * megametric_ton / tera...,0.7281077257801298 CO2 * megametric_ton / tera...
...,...,...,...,...,...,...,...,...,...,...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2046,147.4560889382498 terawatt_hour,0.01549675314325775 CO2 * megametric_ton / ter...,0.0 CO2 * megametric_ton / terawatt_hour,0.01549675314325775 CO2 * megametric_ton / ter...,0.009135442965712622 CO2 * megametric_ton / te...,0.024632196108970373 CO2 * megametric_ton / te...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2047,148.51795809253457 terawatt_hour,0.011539466198706676 CO2 * megametric_ton / te...,0.0 CO2 * megametric_ton / terawatt_hour,0.011539466198706676 CO2 * megametric_ton / te...,0.006802594991255847 CO2 * megametric_ton / te...,0.018342061189962525 CO2 * megametric_ton / te...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2048,149.59023589706763 terawatt_hour,0.0076378334322524805 CO2 * megametric_ton / t...,0.0 CO2 * megametric_ton / terawatt_hour,0.0076378334322524805 CO2 * megametric_ton / t...,0.004502555539017086 CO2 * megametric_ton / te...,0.012140388971269565 CO2 * megametric_ton / te...
"Xcel Energy, Inc.",LGJNMI9GH8XIDG5RCM61,US98389B1008,Electricity Utilities,2049,150.67304159882428 terawatt_hour,0.003791472226051064 CO2 * megametric_ton / te...,0.0 CO2 * megametric_ton / terawatt_hour,0.003791472226051064 CO2 * megametric_ton / te...,0.0022350990531356915 CO2 * megametric_ton / t...,0.006026571279186755 CO2 * megametric_ton / te...


In [54]:
# stop!

pdf = targets_df.pivot(
    index=["company_name", "company_lei", "company_id"], columns="year"
).reset_index()

SyntaxError: invalid syntax (4014574974.py, line 1)

In [None]:
pdf

In [None]:
# stop!
# pdf.insert(1, 'company_lei', pdf.company_name.str.upper().map(gleif_match))
# pdf.insert(2, 'company_id', pdf.company_lei.map(rmi_lei_dict))
# pdf = pdf.set_index(['company_name','company_lei', 'company_id'], drop=True)
pdf.columns.names = [None, None]
pdf

In [None]:
ei_s1_df = pd.concat(
    [
        pdf.company_name,
        pdf.company_lei,
        pdf.company_id,
        pdf.ei_s1_target_by_year.reset_index(),
    ],
    axis=1,
).drop("index", axis=1)
ei_s1_df

In [None]:
ei_s2_df = pd.concat(
    [
        pdf.company_name,
        pdf.company_lei,
        pdf.company_id,
        pdf.ei_s2_target_by_year.reset_index(),
    ],
    axis=1,
).drop("index", axis=1)
ei_s2_df

In [None]:
ei_s1_df.iloc[:, 3] = 2 * ei_s1_df.iloc[:, 4] - ei_s1_df.iloc[:, 5]
ei_s1_df = ei_s1_df[ei_s1_df.company_id.notna()]
ei_s1_df.insert(3, "scope", "S1")
ei_s1_df.head(10)

In [None]:
ei_s2_df.iloc[:, 3] = 2 * ei_s2_df.iloc[:, 4] - ei_s2_df.iloc[:, 5]
ei_s2_df = ei_s2_df[ei_s2_df.company_id.notna()]
ei_s2_df.insert(3, "scope", "S2")
ei_s2_df.head(10)

In [None]:
ei_s1_df.iloc[:, 3] = 2 * ei_s1_df.iloc[:, 4] - ei_s1_df.iloc[:, 5]
try:
    ei_s1_df = co2_ei_df[co2_ei_df.company_id.notna()]
except NameError:
    print(
        "Warning: co2_ei_df not defined. Please run earlier cells to define this variable."
    )
    ei_s1_df = None
ei_s1_df.insert(3, "scope", "S1")
ei_s1_df.head(10)

In [None]:
co2_df = pd.concat(
    [
        pdf.company_name,
        pdf.company_lei,
        pdf.company_id,
        pdf.co2_target_by_year.reset_index(),
    ],
    axis=1,
).drop("index", axis=1)
co2_df = co2_df[co2_df.company_id.notna()]
co2_df.insert(3, "scope", "S1+S2")
co2_df.head()

In [None]:
gen_df = pd.concat(
    [
        pdf.company_name,
        pdf.company_lei,
        pdf.company_id,
        pdf.production_by_year.reset_index(),
    ],
    axis=1,
).drop("index", axis=1)
gen_df.iloc[:, 3] = 2 * gen_df.iloc[:, 4] - gen_df.iloc[:, 5]
gen_df = gen_df[gen_df.company_id.notna()]
gen_df.insert(3, "production", "TWh")
gen_df.head()

with pd.ExcelWriter("rmi-20220307-output.xlsx", datetime_format="YYYY") as writer:
    financial_df.to_excel(writer, sheet_name="fundamental_data", index=False)
    co2_ei_df.to_excel(writer, sheet_name="projected_ei_in_Wh", index=False)
    gen_df.to_excel(writer, sheet_name="projected_production", index=False)
    co2_df.to_excel(writer, sheet_name="projected_co2", index=False)

In [None]:
try:
    portfolio_zero = portfolio_df.copy()
    portfolio_zero.target_probability = 0.0
    portfolio_one = portfolio_df.copy()
    portfolio_one.target_probability = 1.0
except NameError:
    print(
        "Warning: portfolio_df not defined. Please run earlier cells to define this variable."
    )
    portfolio_zero = None
    portfolio_one = None

try:
    portfolio_df.to_csv("rmi-20220307-portfolio.csv", sep=";", index=False)
except NameError:
    print("Warning: portfolio_df not defined. Skipping CSV export.")

In [None]:
engine.execute(
    f"select count (*) from (select parent_name from {rmi_schema}.utility_information group by parent_name)"
).fetchall()

If the following is non-NULL, the Data Vault will reject the company data

In [None]:
engine.execute(
    f"select C.company_name, C.company_id, EI.* from {demo_schema}.company_data C left join {demo_schema}.intensity_data EI on EI.company_name=C.company_name where EI.co2_intensity_target_by_year is NULL"
).fetchall()