# KPI Inference
This notebook takes in the relevant paragraphs to KPIs found in the relevance infer stage, the fine tuned KPI EXTRACTION model from the training stage, and performs inference to return specific answers to the KPIs.

In [1]:
from config_qa_farm_train import QAFileConfig, QAInferConfig
import pprint
import pathlib

import os
from src.data.s3_communication import S3Communication, S3FileType
from src.models.text_kpi_infer import TextKPIInfer
from dotenv import load_dotenv
import zipfile
import config

07/16/2022 15:55:21 - INFO - farm.modeling.prediction_head -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .


In [2]:
# Load credentials
dotenv_dir = os.environ.get(
    "CREDENTIAL_DOTENV_DIR", os.environ.get("PWD", "/opt/app-root/src")
)
dotenv_path = pathlib.Path(dotenv_dir) / "credentials.env"
if os.path.exists(dotenv_path):
    load_dotenv(dotenv_path=dotenv_path, override=True)

In [3]:
# init s3 connector
s3c = S3Communication(
    s3_endpoint_url=os.getenv("S3_ENDPOINT"),
    aws_access_key_id=os.getenv("S3_LANDING_ACCESS_KEY"),
    aws_secret_access_key=os.getenv("S3_LANDING_SECRET_KEY"),
    s3_bucket=os.getenv("S3_BUCKET"),
)

In [4]:
#Settings data files and checkpoints parameters
file_config = QAFileConfig("infer_demo")
infer_config = QAInferConfig("infer_demo")

In [5]:
# When running in Automation using Elyra and Kubeflow Pipelines,
# set AUTOMATION = 1 as an environment variable
if os.getenv("AUTOMATION"):

    # inference results dir
    if not os.path.exists(infer_config.relevance_dir['Text']):
        pathlib.Path(infer_config.relevance_dir['Text']).mkdir(parents=True, exist_ok=True)

    # kpi inference results dir
    if not os.path.exists(infer_config.result_dir['Text']):
        pathlib.Path(infer_config.result_dir['Text']).mkdir(parents=True, exist_ok=True)

    # load dir
    if not os.path.exists(infer_config.load_dir['Text']):
        pathlib.Path(infer_config.load_dir['Text']).mkdir(parents=True, exist_ok=True)

    # download relevance predictions from s3
    s3c.download_files_in_prefix_to_dir(
        config.BASE_INFER_RELEVANCE_S3_PREFIX,
        infer_config.relevance_dir['Text'],
    )

In [6]:
model_root = pathlib.Path(file_config.saved_models_dir).parent
model_rel_zip = pathlib.Path(model_root, 'KPI_EXTRACTION.zip')

# This file must be download because model training doesn't save a local copy
s3c.download_file_from_s3(model_rel_zip, config.CHECKPOINT_S3_PREFIX, "KPI_EXTRACTION.zip")

with zipfile.ZipFile(pathlib.Path(model_root, 'KPI_EXTRACTION.zip'), 'r') as z:
    z.extractall(model_root)

## Inference

We can use the saved model and test it on some real examples.<br><br>
First let's load the model:

In [7]:
file_config.saved_models_dir

'/opt/app-root/src/aicoe-osc-demo-MichaelTiemannOSC/models/KPI_EXTRACTION'

In [8]:
tki = TextKPIInfer(infer_config)



Now, let's make prediction on a pair of paragraph and question.

In [9]:
context = """the paris agreement on climate change drafted in 2015 aims to reduce worldwide emissions of greenhouse
gases to a level intended to limit a rise in global temperatures to below 2 degrees or, better still,
to below 1.5 degrees. verbund’s target of reducing greenhouse gas emissions by 90% measured beginning from
the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and parts of scope 3 emissions
for energy and air travel. the science based targets initiative validated this goal as science-based in october 2016,
i.e. it meets global standards. according to current planning, the target can be achieved.
however, if the grid operator requires higher generation volumes
"""
question = "What is the target year for climate commitment?"


In [10]:
QA_input = [
    {
        "qas": [question],
        "context":  context
    }
]

result = tki.infer_on_dict(QA_input)[0]
pprint.pprint(result)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.46 Batches/s]

{'predictions': [{'answers': [{'answer': '2021',
                               'context': 'the basis year 2011 5 million '
                                          'tonnes co2e until 2021 includes '
                                          'scope 1, scope 2 market- based and '
                                          'par',
                               'document_id': '0-0',
                               'offset_answer_end': 364,
                               'offset_answer_start': 360,
                               'offset_context_end': 412,
                               'offset_context_start': 312,
                               'probability': None,
                               'score': 7.129114151000977},
                              {'answer': 'no_answer',
                               'context': '',
                               'document_id': '0-0',
                               'offset_answer_end': 0,
                               'offset_answer_start': 0,
      




What does the prediction result show? 

In [11]:
# This is the best answer. Generally it can be span-based or it can be no-answer, which ever is higher
# Here the top answer is the span '2021'
result['predictions'][0]['answers'][0]

{'score': 7.129114151000977,
 'probability': None,
 'answer': '2021',
 'offset_answer_start': 360,
 'offset_answer_end': 364,
 'context': 'the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and par',
 'offset_context_start': 312,
 'offset_context_end': 412,
 'document_id': '0-0'}

In [12]:
# Non-answerable score: The model is pretty confident that the answer to the question can be in the context.
result['predictions'][0]['answers'][1]

{'score': -20.135552406311035,
 'probability': None,
 'answer': 'no_answer',
 'offset_answer_start': 0,
 'offset_answer_end': 0,
 'context': '',
 'offset_context_start': 0,
 'offset_context_end': 0,
 'document_id': '0-0'}

Now, let's use the model to infer kpi answers from the relevance results 

In [13]:
infer_config.relevance_dir

{'Text': '/opt/app-root/src/aicoe-osc-demo-MichaelTiemannOSC/data/infer_relevance'}

In [18]:
kpi_df = s3c.download_df_from_s3(
    f"{config.EXPERIMENT_NAME}/kpi_mapping",
    "kpi_mapping.csv",
    filetype=S3FileType.CSV,
    header=0,
)
kpi_df.head()

Unnamed: 0,kpi_id,question,sectors,add_year,kpi_category
0,1.0,What is the company name?,"OG, CM, CU",False,TEXT
1,2.0,What is the Start Date of the CDP report publi...,"OG, CM, CU",False,TEXT
2,3.0,What is the End Date of the CDP report published?,"OG, CM, CU",False,TEXT
3,4.0,What is the currency used for all financial in...,"OG, CM, CU",False,TEXT
4,5.0,Did you have an emissions target that was acti...,"OG, CM, CU",False,TEXT


In [19]:
tki.infer_on_relevance_results(infer_config.relevance_dir['Text'], kpi_df)

07/16/2022 16:22:24 - INFO - src.models.text_kpi_infer -   #################### Starting KPI Inference for the following relevance CSV files found in /opt/app-root/src/aicoe-osc-demo-MichaelTiemannOSC/data/infer_kpi:
['2020-cdp-climate-response_predictions_relevant.csv', 'PGE_Corporation_CDP_Climate_Change_Questionnaire_2021_predictions_relevant.csv', 'vodafone-group-cdp-climate-change-questionnaire2021_predictions_relevant.csv', 'Unilever CDP Climate Response_predictions_relevant.csv', 'NextEra Energy 2021 CDP Response_predictions_relevant-checkpoint.csv', 'gap_inc-_cdp_climate_change_questionnaire_2021_predictions_relevant.csv', 'Corning_Incorporated_CDP_Climate_Change_Questionnaire_2021_FINAL_predictions_relevant.csv', 'Michelin-CDP-Climate-Change-2021_def_predictions_relevant.csv', 'bp-cdp-climate-change-questionnaire-2021_predictions_relevant.csv', 'Adobe_CDP_Climate_Change_Questionnaire_2021_predictions_relevant.csv', '2020-cdp-climate-response_predictions_relevant-checkpoint.csv

Unnamed: 0,pdf_name,kpi,kpi_id,answer,page,paragraph,source,score,no_ans_score,no_answer_score_plus_boost
0,2020-cdp-climate-response,Account for your organization’s gross global S...,,2030,15.0,Emissions from refrigeration equipment account...,Text,-3.801866,16.341036,1.341036
1,2020-cdp-climate-response,Account for your organization’s gross global S...,,52608307.5,17.0,Covered emissions in reporting year (metric to...,Text,-4.640134,15.346012,0.346012
2,2020-cdp-climate-response,Account for your organization’s gross global S...,,1558901,37.0,Metric numerator (Gross global combined Scope ...,Text,-6.075352,15.499842,0.499842
3,2020-cdp-climate-response,Account for your organization’s gross global S...,,1558901,37.0,Metric numerator (Gross global combined Scope ...,Text,-6.075352,15.499842,0.499842
4,2020-cdp-climate-response,Break down your total gross global Scope 1 emi...,,687597,30.0,Gross global Scope 1 emissions (metric tons CO...,Text,8.747887,11.323345,-3.676655
...,...,...,...,...,...,...,...,...,...,...
179,Bayer AG Climate Change 2021,What percentage of your total operational spen...,,6,19.0,Figure or percentage in reporting year 6,Text,-15.731588,3.175036,-11.824964
180,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,313000,47.0,Verified Scope 1 emissions in metric tons CO2e...,Text,8.552532,13.830592,-1.169408
181,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,3.761202 million t CO2e,37.0,"i) Calculation: In 2020, the divestment/closur...",Text,8.039323,17.400318,2.400318
182,Bayer AG Climate Change 2021,What were your organization’s gross global Sco...,,3.761202 million t CO2e,37.0,"i) Calculation: In 2020, 0.134120 million t CO...",Text,5.642964,17.551994,2.551994


In [16]:
if os.getenv("AUTOMATION"):
    # upload the predicted files to s3
    s3c.upload_files_in_dir_to_prefix(
        infer_config.result_dir['Text'],
        config.BASE_INFER_KPI_S3_PREFIX
    )

# Conclusion
This notebook ran the _KPI_ inference on a sample dataset and stored the output in a csv format.