# Data Vault Demo (User, can only score own portfolio)

The basic concept of the Data Vault is that when a user authenticates themself, they receive an engine that gives them access to all the data (rows, columns, tables, schema, etc.) for which they are authorized.  Users who can authenticate themselves for multiple roles can use those roles simultaneously.  We are keeping in mind the importance of Data Lineage Management (tracked by issue https://github.com/os-climate/os_c_data_commons/issues/50) but is not treated as part of this particular prototype.

The steps of this demo are:

1. **Authenticate and acquire SQLAlchemy engine**
    1. Dev engine sees all
    2. Quant engine can do temp scoring but not see fundamental company info
    3. **User engine can use temp scoring but not see cumulative emissions nor overshoot info**
2. With Dev engine, construct Vaults for:
    1. Fundamental corporate financial information
    2. Corporate emissions data (base year, historical)
    3. Corporate target data (start year, end year, target start value, target end value)
    4. Sector benchmark data (production, CO2e intensity)
3. Dev Engine: Visualize projected emissions (targets and trajectories) and calculate cumulative emissions
4. Quant Engine: Using calculated cumulative emmisions, visualize per-company trajectory and target temperature scores
5. **User Engine: Using consensus probability scoring and own portfolio data (ISIN, position value)**
    1. **Calculate publishable per-company temperature alignment score**
    2. **Based on aggregate corporate and portfolio information, produce weighting scores to yield overall portfolio alignment score**

In [1]:
import os
import pathlib
from dotenv import load_dotenv
import osc_ingest_trino as osc
import trino
from sqlalchemy.engine import create_engine

# Load some standard environment variables from a dot-env file, if it exists.
# If no such file can be found, does not fail, and so allows these environment vars to
# be populated in some other way
dotenv_dir = os.environ.get(
    "CREDENTIAL_DOTENV_DIR", os.environ.get("PWD", "/opt/app-root/src")
)
dotenv_path = pathlib.Path(dotenv_dir) / "credentials.env"
if os.path.exists(dotenv_path):
    load_dotenv(dotenv_path=dotenv_path, override=True)

### The ITR module provides Vault objects that coordinate the interaction of Dev, Quant, and User roles.

The SQLAlchemy engines mediate the actual interaction with the Data Vault.

In [2]:
import pandas as pd

# from ITR.portfolio_aggregation import PortfolioAggregationMethod
# from ITR.temperature_score import TemperatureScore
# from ITR.configs import ColumnsConfig, TemperatureScoreConfig
# from ITR.data.data_warehouse import DataWarehouse
from ITR.data.vault_providers import DataVaultWarehouse, VaultCompanyDataProvider

# from ITR.interfaces import ICompanyData, EScope, ETimeFrames, PortfolioCompany, IEIBenchmarkScopes, \
#     IProductionBenchmarkScopes
from ITR.interfaces import EScope  # , IProductionBenchmarkScopes, IEIBenchmarkScopes

  for label, val in metric_conversion.iteritems():


using connect string: trino://MichaelTiemannOSC@trino-secure-odh-trino.apps.odh-cl2.apps.os-climate.org:443/osc_datacommons_dev/demo_dv


### Step 5: Show per-company temperature score and weighted portfolio alignment score

Portfolio weighting scores (which ultimately influence portfolio alignment score) include:
* WATS (size of portfolio company positions used as weights)
* TETS (size of total emissions of portfolio companies used as weights)
* Financial fundamental weights:
    * Market Cap
    * Enterprise Value
    * Assets
    * Revenues

We can pass a list of company IDs to the Data Vault to get back a sum without exposing granular data

In [3]:
sqlstring = "trino://{user}@{host}:{port}/".format(
    user=os.environ["TRINO_USER_USER3"],
    host=os.environ["TRINO_HOST"],
    port=os.environ["TRINO_PORT"],
)

ingest_catalog = "osc_datacommons_dev"
ingest_schema = "demo_dv"
itr_prefix = "itr_"

sqlargs = {
    "auth": trino.auth.JWTAuthentication(os.environ["TRINO_PASSWD_USER3"]),
    "http_scheme": "https",
    "catalog": ingest_catalog,
    "schema": ingest_schema,
}

engine_user = create_engine(sqlstring, connect_args=sqlargs)
print("connecting with engine " + str(engine_user))
osc._do_sql(f"show tables in {ingest_schema}", engine_user, verbose=True)

connecting with engine Engine(trino://os-climate-user3@trino-secure-odh-trino.apps.odh-cl2.apps.os-climate.org:443/)
show tables in demo_dv
[('itr_benchmark_ei',), ('itr_benchmark_prod',), ('itr_company_data',), ('itr_cumulative_budget_1',), ('itr_cumulative_emissions',), ('itr_emissions_data',), ('itr_fundamental_data',), ('itr_overshoot_ratios',), ('itr_production_data',), ('itr_target_data',), ('itr_temperature_scores',), ('itr_trajectory_data',)]


[('itr_benchmark_ei',),
 ('itr_benchmark_prod',),
 ('itr_company_data',),
 ('itr_cumulative_budget_1',),
 ('itr_cumulative_emissions',),
 ('itr_emissions_data',),
 ('itr_fundamental_data',),
 ('itr_overshoot_ratios',),
 ('itr_production_data',),
 ('itr_target_data',),
 ('itr_temperature_scores',),
 ('itr_trajectory_data',)]

Show that we *cannot* access fundamental company data (cannot show until op1st team changes permissions) and cumulative emissions

In [4]:
vault_company_data = VaultCompanyDataProvider(
    engine_user,
    company_table=f"{itr_prefix}company_data",
    target_table=None,
    trajectory_table=None,
    company_schema=ingest_schema,
    column_config=None,
)


select C.company_name, C.company_id from demo_dv.itr_company_data C left join demo_dv.itr_target_data EI on EI.company_name=C.company_name
where EI.ei_s1_by_year is NULL and EI.ei_s1s2_by_year is NULL and EI.ei_s1s2s3_by_year is NULL



2023-03-13 19:05:04,616 - ITR.data.vault_providers - ERROR - Provide either historic emissions data or projections for companies with IDs []


[]


In [5]:
vault_warehouse = DataVaultWarehouse(
    engine_user,
    company_data=None,
    benchmark_projected_production=None,
    benchmarks_projected_ei=None,
    ingest_schema=ingest_schema,
    itr_prefix=itr_prefix,
    column_config=None,
)

Show that we *can* access only temperature scores and weighting methods

In [6]:
portfolio_df = pd.read_csv(
    "data/mdt-20220116-portfolio.csv",
    encoding="iso-8859-1",
    sep=";",
    index_col="company_id",
)
# portfolio_df = pd.read_csv("data/rmi_all.csv", encoding="iso-8859-1", sep=',', index_col='company_id')
portfolio_df

Unnamed: 0_level_0,company_name,company_lei,investment_value
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
US00130H1059,AES Corp.,2NUNNB7D43COUIRE5295,4351252
US0158577090,Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,2228185
US0185223007,"ALLETE, Inc.",549300NNLSIMY6Z8OT86,3829481
US0188021085,Alliant Energy,5493009ML300G373MZ12,3829481
US0236081024,Ameren Corp.,XRZQ5S7HYJFPHJ78L959,15917812
...,...,...,...
US8873991033,TIMKENSTEEL CORP,549300QZTZWHDE9HJL14,10000000
US88830M1027,TITAN INTERNATIONAL INC,254900CXRGBE7C4B5A06,10000000
US9129091081,UNITED STATES STEEL CORP,JNLUVFYJT1OZSIQ24U47,10000000
US9138371003,UNIVERSAL STAINLESS & ALLOY PRODUCTS INC,5493001OEIZDUGXZDE09,10000000


### Calculate portfolio alignment temperature score based on WATS

We can do this with information exclusive to the user space (and the probability-adjusted temperature scores)

Note that companies with no production information (such as TITAL INTERNATIONAL INC and UNIVERSAL STAINLESS & ALLOY PRODUCTS INC will show NaN (Not a Number) as a score.

In [7]:
# PA_SCORE means "Probability-Adjusted" Temperature Score
portfolio_df["pa_score"] = vault_warehouse.get_pa_temp_scores(
    probability=0.5, company_ids=portfolio_df.index.values
).astype("pint[delta_degC]")

In [8]:
# portfolio_df[portfolio_df.company_name=='POSCO']
portfolio_df.dropna(inplace=True)
portfolio_df.sort_values(by="company_name")

Unnamed: 0_level_0,company_name,company_lei,investment_value,pa_score
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US00130H1059,AES Corp.,2NUNNB7D43COUIRE5295,4351252,1.5350088173907537
US0158577090,Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,2228185,1.7479615344904824
US0188021085,Alliant Energy,5493009ML300G373MZ12,3829481,1.494738321300956
US0236081024,Ameren Corp.,XRZQ5S7HYJFPHJ78L959,15917812,1.9670458989359405
US0255371017,"American Electric Power Co., Inc.",1B4S6S7G0TW5EE83BO58,45520637,1.4636075258728485
US05351W1036,"Avangrid, Inc.",549300OX0Q38NLSKPB49,10049068,1.2634482190308003
US05379B1070,Avista Corp.,Q0IK63NITJD6RJ47SW96,2804211,1.4010426933807054
US1442851036,CARPENTER TECHNOLOGY CORP,DX6I6ZD3X5WNNCDJKP85,10000000,2.112734790465044
US1258961002,CMS Energy,549300IA9XFBAGNIBW29,9153135,1.4118106074040455
US2017231034,COMMERCIAL METALS CO,549300OQS2LO07ZJ7N73,10000000,1.305262028847558


In [9]:
weight_for_WATS = portfolio_df["investment_value"].sum()
weight_for_WATS

627818278

In [10]:
portfolio_df["WATS_weight"] = portfolio_df["pa_score"] * (
    portfolio_df["investment_value"] / weight_for_WATS
)
portfolio_df.head()

Unnamed: 0_level_0,company_name,company_lei,investment_value,pa_score,WATS_weight
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
US00130H1059,AES Corp.,2NUNNB7D43COUIRE5295,4351252,1.5350088173907537,0.0106387635096682
US0158577090,Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,2228185,1.7479615344904824,0.0062036767775159
US0188021085,Alliant Energy,5493009ML300G373MZ12,3829481,1.494738321300956,0.0091174026019578
US0236081024,Ameren Corp.,XRZQ5S7HYJFPHJ78L959,15917812,1.9670458989359405,0.0498728181574753
US0255371017,"American Electric Power Co., Inc.",1B4S6S7G0TW5EE83BO58,45520637,1.4636075258728485,0.1061204320268707


In [11]:
print(
    f"Portfolio temperature score based on WATS = {portfolio_df['WATS_weight'].sum()}"
)

Portfolio temperature score based on WATS = 1.4686959033135665 delta_degree_Celsius


### Calculate portfolio alignment temperature score based on TETS

We need to carefully meld portfolio data with corp fundamental data (in this case, emissions)

In [12]:
portfolio_df["TETS_weight"] = vault_company_data.compute_portfolio_weights(
    portfolio_df["pa_score"], 2019, "emissions", EScope.S1S2
)
portfolio_df.head()

Unnamed: 0_level_0,company_name,company_lei,investment_value,pa_score,WATS_weight,TETS_weight
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
US00130H1059,AES Corp.,2NUNNB7D43COUIRE5295,4351252,1.5350088173907537,0.0106387635096682,1.0849933534112957e-07
US0158577090,Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,2228185,1.7479615344904824,0.0062036767775159,3.539160990159548e-08
US0188021085,Alliant Energy,5493009ML300G373MZ12,3829481,1.494738321300956,0.0091174026019578,8.787456932352129e-08
US0236081024,Ameren Corp.,XRZQ5S7HYJFPHJ78L959,15917812,1.9670458989359405,0.0498728181574753,7.520484173433357e-07
US0255371017,"American Electric Power Co., Inc.",1B4S6S7G0TW5EE83BO58,45520637,1.4636075258728485,0.1061204320268707,5.108819415703074e-07


In [13]:
print(
    f"Portfolio temperature score based on TETS = {portfolio_df['TETS_weight'].sum()}"
)

Portfolio temperature score based on TETS = 1.5370187660004366 delta_degree_Celsius


### Calculate portfolio alignment temperature score based on MOTS, EOTS, ECOTS, AOTS, and ROTS

* MOTS = market cap weights
* EOTS = enterprise value weights
* ECOTS = EVIC weights
* AOTS = asset weights
* ROTS = revenue weights

In [14]:
weighting_dict = {
    "MOTS": "company_market_cap",
    "EOTS": "company_ev",
    "ECOTS": "company_evic",
    "AOTS": "company_total_assets",
    "ROTS": "company_revenue",
}

for k, v in weighting_dict.items():
    weight_column = f"{k}_weight"
    portfolio_df[weight_column] = vault_company_data.compute_portfolio_weights(
        portfolio_df["pa_score"], 2019, v, EScope.S1S2
    )
    print(
        f"Portfolio temperature score based on {k} = {portfolio_df[weight_column].sum()}"
    )

portfolio_df

Portfolio temperature score based on MOTS = 1.4584327499028584 delta_degree_Celsius
Portfolio temperature score based on EOTS = 1.4582657692201524 delta_degree_Celsius
Portfolio temperature score based on ECOTS = 1.4586193391103974 delta_degree_Celsius
Portfolio temperature score based on AOTS = 1.4765667685306014 delta_degree_Celsius
Portfolio temperature score based on ROTS = 1.4788400390399583 delta_degree_Celsius


Unnamed: 0_level_0,company_name,company_lei,investment_value,pa_score,WATS_weight,TETS_weight,MOTS_weight,EOTS_weight,ECOTS_weight,AOTS_weight,ROTS_weight
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
US00130H1059,AES Corp.,2NUNNB7D43COUIRE5295,4351252,1.5350088173907537,0.0106387635096682,1.0849933534112957e-07,0.0268136873139678,0.0150473635883901,0.02890669886064,0.0381036480744492,0.0411921536701335
US0158577090,Algonquin Power & Utilities Corp.,549300K5VIUTJXQL7X75,2228185,1.7479615344904824,0.0062036767775159,3.539160990159548e-08,,,,0.0140705658123378,0.0074805957298406
US0188021085,Alliant Energy,5493009ML300G373MZ12,3829481,1.494738321300956,0.0091174026019578,8.787456932352129e-08,0.0278637317289608,0.0268388298269739,0.0274795835977224,0.0184160409141051,0.0143600629694102
US0236081024,Ameren Corp.,XRZQ5S7HYJFPHJ78L959,15917812,1.9670458989359405,0.0498728181574753,7.520484173433357e-07,0.0580961264050034,0.0530732954583222,0.0572645869735542,0.0419859890734391,0.0306178038863786
US0255371017,"American Electric Power Co., Inc.",1B4S6S7G0TW5EE83BO58,45520637,1.4636075258728485,0.1061204320268707,5.108819415703074e-07,0.1022936532080732,0.1042710556963704,0.1013134765902173,0.081944305024551,0.0599853728248903
US05351W1036,"Avangrid, Inc.",549300OX0Q38NLSKPB49,10049068,1.2634482190308003,0.0202231720745145,2.0178677739435216e-10,0.0057581078641722,0.0132729557514826,0.0060266761482273,0.0320785159393717,0.0210902508183653
US05379B1070,Avista Corp.,Q0IK63NITJD6RJ47SW96,2804211,1.4010426933807054,0.0062578925621018,2.069181288404246e-08,0.00663862533565,0.0066860632192311,0.0065598564956116,0.0062867584335782,0.0049653116313556
US1258961002,CMS Energy,549300IA9XFBAGNIBW29,9153135,1.4118106074040455,0.0205831743624406,1.0326549097915105e-07,0.0370991057219006,0.0385831367305552,0.0368491034529924,0.0279516103981299,0.0254520023870533
US2091151041,"Consolidated Edison, Inc.",54930033SBW53OO8T749,20394113,1.3282211274422444,0.0431460706246665,1.370411818995966e-08,0.0621125694107252,0.061984827765304,0.0632324214063408,0.0569096535481295,0.0439861450831269
US25746U1097,Dominion Energy,ILUL7B6Z54MRYCF6H308,33528082,1.4220235346451322,0.0759419140000123,1.894021159869545e-07,0.1416819728549624,0.1319988832049714,0.1399061698622414,0.1089172795710344,0.0620660019823605


### Companies for which we lack production data (and thus cannot chart)

In [15]:
portfolio_df[portfolio_df.pa_score.isnull()]

Unnamed: 0_level_0,company_name,company_lei,investment_value,pa_score,WATS_weight,TETS_weight,MOTS_weight,EOTS_weight,ECOTS_weight,AOTS_weight,ROTS_weight
company_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1


In [16]:
osc._do_sql(
    f"select * from {ingest_schema}.{itr_prefix}company_data",
    engine_user,
    verbose=False,
)

[('AES Corp.', '2NUNNB7D43COUIRE5295', 'US00130H1059', 'Electricity Utilities', 'US', 'North America', 'equity', 'USD', 2019, 10870000000.0, 10189000000.0, 10102000000.0, 11131000000.0, 33648000000.0, 1029000000.0, 261000000.0),
 ('Algonquin Power & Utilities Corp.', '549300K5VIUTJXQL7X75', 'US0158577090', 'Electricity Utilities', 'CA', 'North America', 'equity', 'USD', 2019, None, 1624921000.0, None, None, 10911470000.0, 62485000.0, 6500799000.0),
 ('Alliant Energy', '5493009ML300G373MZ12', 'US0188021085', 'Electricity Utilities', 'US', 'North America', 'equity', 'USD', 2019, 11600000000.0, 3647700000.0, 18503600000.0, 18519900000.0, 16700700000.0, 16300000.0, 6919900000.0),
 ('Ameren Corp.', 'XRZQ5S7HYJFPHJ78L959', 'US0236081024', 'Electricity Utilities', 'US', 'North America', 'equity', 'USD', 2019, 18378774986.0, 5910000000.0, 27804774986.0, 27820774986.0, 28933000000.0, 16000000.0, 9442000000.0),
 ('American Electric Power Co., Inc.', '1B4S6S7G0TW5EE83BO58', 'US0255371017', 'Elect