# KPI Inference
This notebook takes in the relevant paragraphs to KPIs found in the relevance infer stage, the fine tuned KPI EXTRACTION model from the training stage, and performs inference to return specific answers to the KPIs.

In [2]:
from config_qa_farm_train import QAFileConfig, QAInferConfig
import pprint
import pathlib
import os
from src.data.s3_communication import S3Communication
from src.models.text_kpi_infer import TextKPIInfer
from dotenv import load_dotenv
import zipfile
import config

03/18/2022 15:57:45 - INFO - farm.modeling.prediction_head -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .


In [3]:
# Load credentials
dotenv_dir = os.environ.get(
    "CREDENTIAL_DOTENV_DIR", os.environ.get("PWD", "/opt/app-root/src")
)
dotenv_path = pathlib.Path(dotenv_dir) / "credentials.env"
if os.path.exists(dotenv_path):
    load_dotenv(dotenv_path=dotenv_path, override=True)

In [4]:
# init s3 connector
s3c = S3Communication(
    s3_endpoint_url=os.getenv("S3_ENDPOINT"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    s3_bucket=os.getenv("S3_BUCKET"),
)

In [5]:
#Settings data files and checkpoints parameters
file_config = QAFileConfig("infer_demo") 
infer_config = QAInferConfig("infer_demo")

In [6]:
# When running in Automation using Elyra and Kubeflow Pipelines,
# set AUTOMATION = 1 as an environment variable
if os.getenv("AUTOMATION"):
        
    # inference results dir
    if not os.path.exists(infer_config.relevance_dir['Text']):
        pathlib.Path(infer_config.relevance_dir['Text']).mkdir(parents=True, exist_ok=True)

    # kpi inference results dir
    if not os.path.exists(infer_config.result_dir['Text']):
        pathlib.Path(infer_config.result_dir['Text']).mkdir(parents=True, exist_ok=True)

    # load dir
    if not os.path.exists(infer_config.load_dir['Text']):
        pathlib.Path(infer_config.load_dir['Text']).mkdir(parents=True, exist_ok=True)

    # download relevance predictions from s3 
    s3c.download_files_in_prefix_to_dir(
    config.BASE_INFER_RELEVANCE_S3_PREFIX,
    infer_config.relevance_dir['Text'],
    )

In [8]:
model_root = pathlib.Path(file_config.saved_models_dir).parent
model_rel_zip = pathlib.Path(model_root, 'KPI_EXTRACTION.zip')
s3c.download_file_from_s3(model_rel_zip, config.CHECKPOINT_S3_PREFIX, "KPI_EXTRACTION.zip")
with zipfile.ZipFile(pathlib.Path(model_root, 'KPI_EXTRACTION.zip'), 'r') as z:
    z.extractall(model_root)

## Inference

We can use the saved model and test it on some real examples.<br><br>
First let's load the model:

In [9]:
file_config.saved_models_dir

'/opt/app-root/src/aicoe-osc-demo-2022-02-28-14-32/models/KPI_EXTRACTION'

In [10]:
tki = TextKPIInfer(infer_config)



Now, let's make prediction on a pair of paragraph and question.

In [11]:
context = """the paris agreement on climate change drafted in 2015 aims to reduce worldwide emissions of greenhouse 
gases to a level intended to limit a rise in global temperatures to below 2 degrees or, better still,
to below 1.5 degrees. verbund’s target of reducing greenhouse gas emissions by 90% measured beginning from 
the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and parts of scope 3 emissions 
for energy and air travel. the science based targets initiative validated this goal as science-based in october 2016, 
i.e. it meets global standards. according to current planning, the target can be achieved. 
however, if the grid operator requires higher generation volumes 
"""
question = "What is the target year for climate commitment?"
    

In [12]:
QA_input = [
        {
            "qas": [question],
            "context":  context
        }]

result = tki.infer_on_dict(QA_input)[0]
pprint.pprint(result)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.54 Batches/s]

{'predictions': [{'answers': [{'answer': '2021',
                               'context': 'the basis year 2011 5 million '
                                          'tonnes co2e until 2021 includes '
                                          'scope 1, scope 2 market- based and '
                                          'par',
                               'document_id': '0-0',
                               'offset_answer_end': 366,
                               'offset_answer_start': 362,
                               'offset_context_end': 414,
                               'offset_context_start': 314,
                               'probability': None,
                               'score': 7.129114151000977},
                              {'answer': 'no_answer',
                               'context': '',
                               'document_id': '0-0',
                               'offset_answer_end': 0,
                               'offset_answer_start': 0,
      




What does the prediction result show? 

In [13]:
# This is the best answer. Generally it can be span-based or it can be no-answer, which ever is higher
# Here the top answer is the span '2021'
result['predictions'][0]['answers'][0]

{'score': 7.129114151000977,
 'probability': None,
 'answer': '2021',
 'offset_answer_start': 362,
 'offset_answer_end': 366,
 'context': 'the basis year 2011 5 million tonnes co2e until 2021 includes scope 1, scope 2 market- based and par',
 'offset_context_start': 314,
 'offset_context_end': 414,
 'document_id': '0-0'}

In [14]:
# Non-answerable score: The model is pretty confident that the answer to the question can be in the context.
result['predictions'][0]['answers'][1]

{'score': -20.135552406311035,
 'probability': None,
 'answer': 'no_answer',
 'offset_answer_start': 0,
 'offset_answer_end': 0,
 'context': '',
 'offset_context_start': 0,
 'offset_context_end': 0,
 'document_id': '0-0'}

Now, let's use the model to infer kpi answers from the relevance results 

In [15]:
infer_config.relevance_dir

{'Text': '/opt/app-root/src/aicoe-osc-demo-2022-02-28-14-32/data/infer_relevance'}

In [16]:
tki.infer_on_relevance_results(infer_config.relevance_dir['Text'])

03/18/2022 16:00:38 - INFO - src.models.text_kpi_infer -   #################### Starting KPI Inference for the following relevance CSV files found in /opt/app-root/src/aicoe-osc-demo-2022-02-28-14-32/data/infer_kpi:
['75506106_BOA_2016-12-31_predictions_relevant.csv', 'sustainability-report-2019_predictions_relevant.csv', '90044053_Fisher & Paykel Hl_2017-11-07_predictions_relevant.csv', '88094292_Carriage Svcs Inc_2019-07-23_predictions_relevant.csv'] 
03/18/2022 16:00:38 - INFO - src.models.text_kpi_infer -   #################### 1/4
03/18/2022 16:00:38 - INFO - src.models.text_kpi_infer -   Starting KPI Extraction for 75506106_BOA_2016-12-31
Inferencing Samples: 100%|██████████| 5/5 [00:01<00:00,  4.29 Batches/s]
03/18/2022 16:00:39 - INFO - src.models.text_kpi_infer -   Save the result of KPI extraction to /opt/app-root/src/aicoe-osc-demo-2022-02-28-14-32/data/infer_kpi/75506106_BOA_2016-12-31_predictions_kpi.csv
03/18/2022 16:00:39 - INFO - src.models.text_kpi_infer -   ##########

Unnamed: 0,pdf_name,kpi,kpi_id,answer,page,paragraph,source,score,no_ans_score,no_answer_score_plus_boost,index
0,75506106_BOA_2016-12-31,In which year was the annual report or the sus...,,2015-2016,24.0,Nombre de projets ayant atteint le closing fin...,Text,11.181186,-9.148028,-24.148028,
1,75506106_BOA_2016-12-31,In which year was the annual report or the sus...,,2016,30.0,L’Atelier Finance Climat pour l’Afrique Franc...,Text,11.143324,-9.074394,-24.074394,
2,75506106_BOA_2016-12-31,In which year was the annual report or the sus...,,2016,48.0,"Au cours de l’anne 2016, EFE-Maroc a form 4 74...",Text,11.117598,-9.312975,-24.312975,
3,75506106_BOA_2016-12-31,In which year was the annual report or the sus...,,2016,30.0,"Business Climate Summit, 28-29 juin 2016, Lond...",Text,11.020465,-8.655083,-23.655083,
4,75506106_BOA_2016-12-31,What is the base year for carbon reduction com...,,2016,30.0,"Business Climate Summit, 28-29 juin 2016, Lond...",Text,-8.263815,6.064293,-8.935707,
...,...,...,...,...,...,...,...,...,...,...,...
51,90044053_Fisher & Paykel Hl_2017-11-07,What is the volume of estimated proven hydroca...,,no_answer,,,Text,2.527988,,,
52,90044053_Fisher & Paykel Hl_2017-11-07,What is the volume of estimated proven hydroca...,,3,26.0,"Within the ﬁrst month, we were able to increas...",Text,-9.606719,17.527988,2.527988,
53,90044053_Fisher & Paykel Hl_2017-11-07,What is the volume of estimated proven hydroca...,,0.33 tCO2e/ NZ$M,25.0,This ﬁnancial year we measured waste outputs i...,Text,-9.643618,17.502722,2.502722,
0,88094292_Carriage Svcs Inc_2019-07-23,In which year was the annual report or the sus...,,"February 15, 2017",0.0,"CARRIAGE SERVICES, INC. (the Company) CORPORAT...",Text,8.340643,-9.459160,-24.459160,


In [17]:
# upload the predicted files to s3
s3c.upload_files_in_dir_to_prefix(
    infer_config.result_dir['Text'],
    config.BASE_INFER_KPI_S3_PREFIX
)

# Conclusion
This notebook ran the _KPI_ inference on a sample dataset and stored the output in a csv format.