In [1]:
import os

from model_pipeline import FARMTrainer
from model_pipeline import ModelConfig, TokenizerConfig, TrainingConfig, FileConfig, MLFlowConfig, ProcessorConfig

07/24/2020 03:52:20 - INFO - transformers.file_utils -   PyTorch version 1.5.0 available.


## Training Pipeline

The training pipeline trains the relevance classifier once the dataset has been extracted and curated. The model trained is comprised of a transformer model (e.g., BERT) that can be loaded pre-trained on the NQ dataset into the pipeline and then be fine-tuned on the curated data for our specific relevance detection task.

Our pipeline includes components that are provided by the FARM library. FARM is a framework which facilitates transfer learning tasks for BERT based models. Documentation for FARM is available here: https://farm.deepset.ai.



For our demo, we use the curated data generated after receiving the latest set of annotations provided by the Allianz ESG team.

#### Set parameters

Before starting training, parameters for each component of the training pipeline must be set. For this we create `config` objects which hold these parameters. Default values have already been set but they can be easily changed.

In [2]:
file_config = FileConfig()  # Settings data files and checkpoints parameters
processor_config = ProcessorConfig()  # Settings for the processor component
tokenizer_config = TokenizerConfig()  # Settings for the tokenizer
model_config = ModelConfig()  # Settings for the model
train_config = TrainingConfig()  # Settings for training
mlflow_config = MLFlowConfig()  # Settings for training

Parameters can be changed as follows:

In [3]:
file_config.experiment_name = "demo_training"

However, we advise that you manually update the parameters in the corresponding config file:

`esg_data_pipeline/config/config_farm_trainer.py`

We can check the value for some parameters:

In [4]:
print(f"Experiment_name: \n {file_config.experiment_name} \n")
print(f"Data directory: \n {file_config.data_dir} \n")
print(f"Curated dataset path: \n {file_config.curated_data} \n")
print(f"Split train/validation ratio: \n{file_config.test_split} \n")
print(f"Training dataset path: \n {file_config.train_filename} \n")
print(f"Validation dataset path: \n {file_config.dev_filename} \n")
print(f"Directory where trained model is saved: \n {file_config.saved_models_dir} \n")

Experiment_name: 
 demo_training 

Data directory: 
 /model_pipeline/model_pipeline/data 

Curated dataset path: 
 /model_pipeline/model_pipeline/data/esg_dataset.csv 

Split train/validation ratio: 
0.2 

Training dataset path: 
 /model_pipeline/model_pipeline/data/train_split_02.csv 

Validation dataset path: 
 /model_pipeline/model_pipeline/data/val_split_02.csv 

Directory where trained model is saved: 
 saved_models/test_farm 



In [5]:
print(f"Max number of tokens per example: {processor_config.max_seq_len} \n")

Max number of tokens per example: 512 



In [6]:
print(f"Use GPU: {train_config.use_cuda} \n")

Use GPU: True 



In [7]:
print(f"Learning_rate: {train_config.learning_rate} \n")
print(f"Number of epochs for fine tuning: {train_config.n_epochs} \n")
print(f"Batch size: {train_config.batch_size} \n")
print(f"Perform Cross validation: {train_config.run_cv} \n")

Learning_rate: 1e-05 

Number of epochs for fine tuning: 1 

Batch size: 16 

Perform Cross validation: False 



#### Load model trained on NQ dataset

We have already trained a relevance classifier on Google's large NQ dataset. We then saved the model in the following directory: `file_config.saved_models_dir / "relevance_roberta"`

We need to load this model in our pipeline to fine-tune a relevance classifier on our specific ESG curated dataset. For this we have to set the parameter `model_config.load_dir` to be the directory where we saved our first checkpoint. We can check that this is set:

In [8]:
print(f"NQ checkpoint directory: {model_config.load_dir}")

NQ checkpoint directory: /model_pipeline/model_pipeline/saved_models/relevance_roberta


#### Fine-tune on curated ESG data

Once all the parameters are set a `FARMTrainer` object can be instantiated by passing all the configuration objects

In [9]:
farm_trainer = FARMTrainer(
    file_config=file_config,
    tokenizer_config=tokenizer_config,
    model_config=model_config,
    processor_config=processor_config,
    training_config=train_config,
    mlflow_config=mlflow_config,
)

Call the method `run()` to start training

In [10]:
farm_trainer.run()

07/24/2020 03:52:32 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: True
07/24/2020 03:52:32 - INFO - farm.modeling.tokenization -   Loading tokenizer of type 'RobertaTokenizer'
07/24/2020 03:52:33 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json from cache at /root/.cache/torch/transformers/d0c5776499adc1ded22493fae699da0971c1ee4c2587111707a4d177d20257a2.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
07/24/2020 03:52:33 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt from cache at /root/.cache/torch/transformers/b35e7cd126cd4229a746b5d5c29a749e8e84438b14bcdb575950584fe33207e8.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
The git executable must be specified in one of the following ways:
    - be included in 

07/24/2020 03:52:34 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 2-0
Clear Text: 
 	text: What is the total amount of direct greenhouse gases emissions referred to as scope 1 emissions?
 	text_b: Breakdown of petroleum products delivery to consignees
 	text_classification_label: 0
Tokenized: 
 	tokens: ['What', 'Ġis', 'Ġthe', 'Ġtotal', 'Ġamount', 'Ġof', 'Ġdirect', 'Ġgreenhouse', 'Ġgases', 'Ġemissions', 'Ġreferred', 'Ġto', 'Ġas', 'Ġscope', 'Ġ1', 'Ġemissions', '?']
 	tokens_b: ['Break', 'down', 'Ġof', 'Ġpetroleum', 'Ġproducts', 'Ġdelivery',

07/24/2020 03:52:40 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 1-0
Clear Text: 
 	text: What is the total volume of hydrocarbons production?
 	text_b: Our regulations and measures for transportation and warehouse safety cover the delivery of raw materials, the storage and distribution of chemical products among BASF sites and customers, and the transportation of waste from our sites to the disposal facilities.
 	text_classification_label: 0
Tokenized: 
 	tokens: ['What', 'Ġis', 'Ġthe', 'Ġtotal', 'Ġvolume', 'Ġof', 'Ġhydro', 'car', 'bons'

07/24/2020 03:52:56 - INFO - transformers.modeling_utils -   loading weights file /model_pipeline/model_pipeline/saved_models/relevance_roberta/language_model.bin from cache at /model_pipeline/model_pipeline/saved_models/relevance_roberta/language_model.bin
07/24/2020 03:53:01 - INFO - transformers.modeling_utils -   All model checkpoint weights were used when initializing RobertaModel.

07/24/2020 03:53:01 - INFO - transformers.modeling_utils -   All the weights of RobertaModel were initialized from the model checkpoint at /model_pipeline/model_pipeline/saved_models/relevance_roberta/language_model.bin.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
07/24/2020 03:53:01 - INFO - farm.modeling.adaptive_model -   Found files for loading 1 prediction heads
07/24/2020 03:53:01 - INFO - farm.modeling.prediction_head -   Prediction head initialized with size [768, 2]
07/24/2020 03:53:0

Trained model saved to saved_models/test_farm
Processor vocabulary saved to saved_models/test_farm


At the end of the training process, the model and the processor vocabulary are saved into the directory `file_config.saved_models_dir`

In [18]:
!ls -al $file_config.saved_models_dir

total 488308
drwxr-xr-x 2 root root      4096 Jul 24 03:54 .
drwxr-xr-x 3 root root      4096 Jul 24 03:54 ..
-rw-r--r-- 1 root root 498630327 Jul 24 03:54 language_model.bin
-rw-r--r-- 1 root root       562 Jul 24 03:54 language_model_config.json
-rw-r--r-- 1 root root    456318 Jul 24 03:54 merges.txt
-rw-r--r-- 1 root root      6879 Jul 24 03:54 prediction_head_0.bin
-rw-r--r-- 1 root root       556 Jul 24 03:54 prediction_head_0_config.json
-rw-r--r-- 1 root root       727 Jul 24 03:54 processor_config.json
-rw-r--r-- 1 root root       772 Jul 24 03:54 special_tokens_map.json
-rw-r--r-- 1 root root       167 Jul 24 03:54 tokenizer_config.json
-rw-r--r-- 1 root root    898822 Jul 24 03:54 vocab.json


## Cross-validation

To better estimate the performance of the model on new data, it is recommended to perform k-folds cross validation (CV). CV works as follows:

- Split the entire data randomly into k folds (usually 5 to 10)
- Fit the model using the K — 1 folds and validate the model using the remaining Kth fold and save the scores
- Repeat until every K-fold serve as the test set and average the saved scores

_FARMTrainer_ includes this features. To perform 3-fold CV proceed as follows:

In [24]:
train_config.run_cv = True
train_config.xval_folds = 3
train_config.n_epochs = 3

In [25]:
farm_trainer = FARMTrainer(
    file_config=file_config,
    tokenizer_config=tokenizer_config,
    model_config=model_config,
    processor_config=processor_config,
    training_config=train_config,
    mlflow_config=mlflow_config,
)

In [26]:
farm_trainer.run()

07/24/2020 01:00:27 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: True
07/24/2020 01:00:27 - INFO - farm.modeling.tokenization -   Loading tokenizer of type 'RobertaTokenizer'
07/24/2020 01:00:27 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json from cache at /home/ehsanmontazeri/.cache/torch/transformers/d0c5776499adc1ded22493fae699da0971c1ee4c2587111707a4d177d20257a2.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
07/24/2020 01:00:27 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt from cache at /home/ehsanmontazeri/.cache/torch/transformers/b35e7cd126cd4229a746b5d5c29a749e8e84438b14bcdb575950584fe33207e8.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
07/24/2020 01:00:28 - INFO - farm.data_handler.data_silo -

_____________________________________________________
Preprocessing Dataset /home/ehsanmontazeri/Allianz_NLP/esg_data_pipeline/esg_data_pipeline/data/train_split_02.csv: 100%|██████████| 1196/1196 [00:03<00:00, 390.09 Dicts/s]
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -   Loading dev set from: /home/ehsanmontazeri/Allianz_NLP/esg_data_pipeline/esg_data_pipeline/data/val_split_02.csv
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -   Got ya 3 parallel workers to convert 300 dictionaries to pytorch datasets (chunksize = 20)...
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -    0    0    0 
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -   /w\  /|\  /w\
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -   / \  /'\  /'\
07/24/2020 01:00:31 - INFO - farm.data_handler.data_silo -       
Preprocessing Dataset /home/ehsanmontazeri/Allianz_NLP/esg_data_pipeline/esg_data_pipeline/data/val_split_02.csv:   0%|          | 0/300 [00:00<?, ? Dict

_____________________________________________________
Preprocessing Dataset /home/ehsanmontazeri/Allianz_NLP/esg_data_pipeline/esg_data_pipeline/data/val_split_02.csv: 100%|██████████| 300/300 [00:01<00:00, 225.69 Dicts/s]
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   No test set is being loaded
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Examples in train: 1196
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Examples in dev  : 300
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Examples in test : 0
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Longest sequence length observed after clipping:     512
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Average sequence length after clipping: 68.60367892976589
07/24/2020 01:00:33 - INFO - farm.data_handler.data_silo -   Proportion clipped:      0.0016722408026755853
07/24/2020 01:00:33 - INFO - f

############ Crossvalidation: Fold 0 ############


07/24/2020 01:00:33 - INFO - transformers.modeling_utils -   loading weights file https://cdn.huggingface.co/roberta-base-pytorch_model.bin from cache at /home/ehsanmontazeri/.cache/torch/transformers/80b4a484eddeb259bec2f06a6f2f05d90934111628e0e1c09a33bd4a121358e1.49b88ba7ec2c26a7558dda98ca3884c3b80fa31cf43a1b1f23aef3ff81ba344e
07/24/2020 01:00:38 - INFO - transformers.modeling_utils -   All model checkpoint weights were used when initializing RobertaModel.

07/24/2020 01:00:38 - INFO - transformers.modeling_utils -   All the weights of RobertaModel were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
07/24/2020 01:00:38 - INFO - transformers.modeling_utils -   loading weights file /home/ehsanmontazeri/Allianz_NLP/esg_

07/24/2020 01:02:35 - INFO - farm.eval -   task_name: text_classification
07/24/2020 01:02:35 - INFO - farm.eval -   acc: 0.9396984924623115
07/24/2020 01:02:35 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

           0     0.9450    0.9450    0.9450       109
           1     0.9333    0.9333    0.9333        90

    accuracy                         0.9397       199
   macro avg     0.9391    0.9391    0.9391       199
weighted avg     0.9397    0.9397    0.9397       199

Train epoch 2/2 (Cur. train loss: 0.0076): 100%|██████████| 50/50 [00:43<00:00,  1.15it/s]
Evaluating: 100%|██████████| 32/32 [00:08<00:00,  3.82it/s]
07/24/2020 01:03:07 - INFO - farm.eval -   

\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
***************************************************
***** EVALUATION | TEST SET | AFTER 150 BATCHES *****
***************************************************
\\|//       \\|//  

############ Crossvalidation: Fold 1 ############


07/24/2020 01:03:11 - INFO - transformers.modeling_utils -   loading weights file https://cdn.huggingface.co/roberta-base-pytorch_model.bin from cache at /home/ehsanmontazeri/.cache/torch/transformers/80b4a484eddeb259bec2f06a6f2f05d90934111628e0e1c09a33bd4a121358e1.49b88ba7ec2c26a7558dda98ca3884c3b80fa31cf43a1b1f23aef3ff81ba344e
07/24/2020 01:03:15 - INFO - transformers.modeling_utils -   All model checkpoint weights were used when initializing RobertaModel.

07/24/2020 01:03:15 - INFO - transformers.modeling_utils -   All the weights of RobertaModel were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
07/24/2020 01:03:16 - INFO - transformers.modeling_utils -   loading weights file /home/ehsanmontazeri/Allianz_NLP/esg_

07/24/2020 01:05:13 - INFO - farm.eval -   task_name: text_classification
07/24/2020 01:05:13 - INFO - farm.eval -   acc: 0.9396984924623115
07/24/2020 01:05:13 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

           0     0.9519    0.9340    0.9429       106
           1     0.9263    0.9462    0.9362        93

    accuracy                         0.9397       199
   macro avg     0.9391    0.9401    0.9395       199
weighted avg     0.9400    0.9397    0.9397       199

Train epoch 2/2 (Cur. train loss: 0.0005): 100%|██████████| 50/50 [00:43<00:00,  1.15it/s]
Evaluating: 100%|██████████| 32/32 [00:08<00:00,  3.82it/s]
07/24/2020 01:05:44 - INFO - farm.eval -   

\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
***************************************************
***** EVALUATION | TEST SET | AFTER 150 BATCHES *****
***************************************************
\\|//       \\|//  

############ Crossvalidation: Fold 2 ############


07/24/2020 01:05:49 - INFO - transformers.modeling_utils -   loading weights file https://cdn.huggingface.co/roberta-base-pytorch_model.bin from cache at /home/ehsanmontazeri/.cache/torch/transformers/80b4a484eddeb259bec2f06a6f2f05d90934111628e0e1c09a33bd4a121358e1.49b88ba7ec2c26a7558dda98ca3884c3b80fa31cf43a1b1f23aef3ff81ba344e
07/24/2020 01:05:53 - INFO - transformers.modeling_utils -   All model checkpoint weights were used when initializing RobertaModel.

07/24/2020 01:05:53 - INFO - transformers.modeling_utils -   All the weights of RobertaModel were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
07/24/2020 01:05:53 - INFO - transformers.modeling_utils -   loading weights file /home/ehsanmontazeri/Allianz_NLP/esg_

07/24/2020 01:07:50 - INFO - farm.eval -   task_name: text_classification
07/24/2020 01:07:50 - INFO - farm.eval -   acc: 0.9447236180904522
07/24/2020 01:07:50 - INFO - farm.eval -   report: 
               precision    recall  f1-score   support

           0     0.9519    0.9429    0.9474       105
           1     0.9368    0.9468    0.9418        94

    accuracy                         0.9447       199
   macro avg     0.9444    0.9448    0.9446       199
weighted avg     0.9448    0.9447    0.9447       199

Train epoch 2/2 (Cur. train loss: 0.0190): 100%|██████████| 50/50 [00:43<00:00,  1.14it/s]
Evaluating: 100%|██████████| 32/32 [00:08<00:00,  3.82it/s]
07/24/2020 01:08:22 - INFO - farm.eval -   

\\|//       \\|//      \\|//       \\|//     \\|//
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
***************************************************
***** EVALUATION | TEST SET | AFTER 150 BATCHES *****
***************************************************
\\|//       \\|//  

############ RESULT_CV -- 3 folds ############
Mean F1:  93.9, std F1: 0.006
Mean recall:  93.2, std recall: 0.004
Mean accuracy:  94.3, std accuracy; 0.006
Mean precision:  94.6, std  precision: 0.015





! CV mode does not save a checkpoint, it is only used for validation

## Inference

We can use the saved model and test it on some real examples.

In [11]:
from farm.infer import Inferencer

In [12]:
model = Inferencer.load(file_config.saved_models_dir, gpu=True)

07/24/2020 03:54:45 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
07/24/2020 03:54:45 - INFO - transformers.modeling_utils -   loading weights file saved_models/test_farm/language_model.bin from cache at saved_models/test_farm/language_model.bin
07/24/2020 03:54:51 - INFO - transformers.modeling_utils -   All model checkpoint weights were used when initializing RobertaModel.

07/24/2020 03:54:51 - INFO - transformers.modeling_utils -   All the weights of RobertaModel were initialized from the model checkpoint at saved_models/test_farm/language_model.bin.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
07/24/2020 03:54:51 - INFO - farm.modeling.adaptive_model -   Found files for loading 1 prediction heads
07/24/2020 03:54:51 - INFO - farm.modeling.prediction_head -   Prediction head initialized with size [768, 2

07/24/2020 03:55:32 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 0-0
Clear Text: 
 	text: What is the climate commitment scenario considered?
 	text_b: AGL’s approach to transitioning to a low-carbon future is set out within the AGL Greenhouse Gas Policy. This policy acknowledges that Australia is moving to a carbon-constrained future and provides a framework within which greenhouse gas reduction activities will be structured, presenting a pathway for the gradual decarbonisation of AGL’s generation portfolio by mid-century. The commitment

07/24/2020 03:55:35 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 0-0
Clear Text: 
 	text: What is the climate commitment scenario considered?
 	text_b: 1. There is a transparent procedure implemented in the Company that provides the shareholders with the opportunity to send questions to the Chairman of the Board of Directors and express their position.
Tokenized: 
 	tokens: ['What', 'Ġis', 'Ġthe', 'Ġclimate', 'Ġcommitment', 'Ġscenario', 'Ġconsidered', '?']
 	tokens_b: ['1', '.', 'ĠThere', 'Ġis', 'Ġa', 'Ġtransparent', 'Ġprocedure', 'Ġimple

07/24/2020 03:55:37 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 0-0
Clear Text: 
 	text: What is the target carbon reduction in percentage?
 	text_b: In 2018 we signed the Business in the Community Ireland Low Carbon Pledge, agreeing to reduce greenhouse gas emissions by half by 2030. We also co-chair the Transition to a Low Carbon Economy Group, comprising representatives from some of the companies who hold the Businesses Working Responsibly Mark. The Group meets regularly to agree collaborative action to improve the sustainability of t

07/24/2020 03:55:38 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 0-0
Clear Text: 
 	text: What is the target carbon reduction in percentage?
 	text_b: In 2017, the program was extended to Abu Dhabi in the Middle East and Africa region, to two OMV Petrom (Romania) sites and eight main contractors from Upstream and Downstream in OMV Petrom. Interviews and focus group discussions with the management and employees at all levels of the business have provided the current picture of our safety culture and helped to understand the origins of our 

In [13]:
def get_inference_results(example):
    result = model.inference_from_dicts(dicts=[example], return_json=False)
    result = "Relevant" if result[0]["predictions"][0]["label"] == "1" else "Not Relevant"
    print(result)

In [14]:
example = {
    "text": "What is the climate commitment scenario considered?",
    "text_b": "AGL’s approach to transitioning to a low-carbon future is set out within "
    "the AGL Greenhouse Gas Policy. This policy acknowledges that Australia is moving to a "
    "carbon-constrained future and provides a framework within which greenhouse gas reduction "
    "activities will be structured, presenting a pathway for the gradual decarbonisation of "
    "AGL’s generation portfolio by mid-century. The commitments of AGL within this policy are "
    "not inconsistent with the goal of the Paris Agreement to limit warming to below 2 degrees "
    "celsius above pre-industrial levels.",
}

get_inference_results(example)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 11.71 Batches/s]

Relevant





In [15]:
example = {
    "text": "What is the climate commitment scenario considered?",
    "text_b": "1. There is a transparent procedure implemented in the Company that provides the "
    "shareholders with the opportunity to send questions to the Chairman of the Board of Directors "
    "and express their position.",
}
get_inference_results(example)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 50.40 Batches/s]

Not Relevant





In [16]:
example = {
    "text": "What is the target carbon reduction in percentage?",
    "text_b": "In 2018 we signed the Business in the Community Ireland Low Carbon Pledge, "
    "agreeing to reduce greenhouse gas emissions by half by 2030. We also co-chair the Transition to a "
    "Low Carbon Economy Group, comprising representatives from some of the companies who hold the "
    "Businesses Working Responsibly Mark. The Group meets regularly to agree collaborative action to "
    "improve the sustainability of the Irish business sector.",
}
get_inference_results(example)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 48.69 Batches/s]

Relevant





In [17]:
example = {
    "text": "What is the target carbon reduction in percentage?",
    "text_b": "In 2017, the program was extended to Abu Dhabi in the Middle East and Africa region, "
    "to two OMV Petrom (Romania) sites and eight main contractors from Upstream and "
    "Downstream in OMV Petrom. Interviews and focus group discussions with the management "
    "and employees at all levels of the business have provided the current picture of our "
    "safety culture and helped to understand the origins of our daily decisions and behavior. "
    "The program continued with train- ing of selected employees, local management and supervisors "
    "in the chosen locations.",
}
get_inference_results(example)

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 58.04 Batches/s]

Not Relevant



