*Copyright (c) Microsoft Corporation. All rights reserved.*

*Licensed under the MIT License.*

# The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding


## Summary

MT-DNN is an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid
customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multitask knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The pip installable package and pretrained models will be publicly available at https://github.com/microsoft/mt-dnn.

### Design

MT-DNN is designed for modularity, flexibility, and ease of use. These modules are built upon PyTorch (Paszke et al., 2019) and Transformers (Wolf
et al., 2019), allowing the use of the SOTA pretrained models, e.g., BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019c) and UniLM (Dong
et al., 2019). The unique attribute of this package is a flexible interface for adversarial multi-task fine-tuning and knowledge distillation, so that researchers and developers can build large SOTA NLU models and then compress them to small ones
for online deployment.The overall workflow and system architecture are shown in figures 1 and 3 respectively.


![Workflow Design](https://nlpbp.blob.core.windows.net/images/mt-dnn2.JPG)

The above figure shows workflow of MT-DNN: train a neural language model on a large amount of unlabeled raw text
to obtain general contextual representations; then finetune the learned contextual representation on downstream tasks, e.g. GLUE (Wang et al., 2018); lastly, distill this large model to a lighter one for online deployment. In the later two phrases, we can leverage powerful multi-task learning and adversarial training to further improve performance.

## Architecture

![overall_arch](https://nlpbp.blob.core.windows.net/images/mt-dnn.png)
The figure above shows the overall system architecture. The lower layers are shared across all tasks while the top layers are taskspecific. The input X (either a sentence or a set of sentences) is first represented as a sequence of embedding
vectors, one for each word, in l1. Then the encoder, e.g a Transformer or recurrent neural network (LSTM) model,
captures the contextual information for each word and generates the shared contextual embedding vectors in l2.
Finally, for each task, additional task-specific layers generate task-specific representations, followed by operations
necessary for classification, similarity scoring, or relevance ranking. In case of adversarial training, we perturb
embeddings from the lexicon encoder and then add an extra loss term during the training. Note that for the
inference phrase, it does not require perturbations.

## Introduction
In this notebook, we fine-tune and evaluate MT-DNN models on a subset of the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) dataset.  

### Running Time

This is a __computationally intensive__ notebook that runs on the entire MNLI dataset for match and mismatched datasets for training, development and test.  

The table below provides some reference running time on a GPU machine.  

|Dataset|MULTI_GPU_ON|Machine Configurations|Running time|
|:------|:---------|:----------------------|:------------|
|MultiNLI|True|4 NVIDIA Tesla K80 GPUs, 24GB GPU memory| ~ 20 hours |

If you run into `CUDA out-of-memory error` or the jupyter kernel dies constantly, try reducing the `BATCH_SIZE` and `MAX_SEQ_LEN` in `MTDNNConfig`, but note that model performance may be compromised.


## PyTorch Setup 

In [None]:
# uninstall the default 1.4.0 and run the following command (due to the config of our dev box)
# !pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html


### Text Classification of MultiNLI Sentences using MT-DNN

This notebook utilizes the pip installable package that implements the Multi-Task Deep Neural Network Toolkit (MTDNN) for Natural Language Understanding. It's recommended to run this notebook on GPU machines as it's very computationally intensive.

In [1]:
%load_ext autoreload
%autoreload 2

In [18]:
import json
import os
import shutil
import sys
from tempfile import TemporaryDirectory

import pandas as pd
import torch

from mtdnn.common.types import EncoderModelType
from mtdnn.configuration_mtdnn import MTDNNConfig
from mtdnn.data_builder_mtdnn import MTDNNDataBuilder
from mtdnn.modeling_mtdnn import MTDNNModel
from mtdnn.process_mtdnn import MTDNNDataProcess
from mtdnn.tasks.config import MTDNNTaskDefs
from mtdnn.tokenizer_mtdnn import MTDNNTokenizer

## Define Configuration, Tasks and Model Objects

In [3]:
# Define Configuration, Tasks and Model Objects
ROOT_DIR = TemporaryDirectory().name
OUTPUT_DIR = os.path.join(ROOT_DIR, 'checkpoint')
os.makedirs(OUTPUT_DIR) if not os.path.exists(OUTPUT_DIR) else OUTPUT_DIR

LOG_DIR = os.path.join(ROOT_DIR, 'tensorboard_logdir')
os.makedirs(LOG_DIR) if not os.path.exists(LOG_DIR) else LOG_DIR

DATA_DIR = "../../../glue_data/"
TASK_DATA_DIRS = {
    'qqp': os.path.join(DATA_DIR, "QQP"),
    'mnli': os.path.join(DATA_DIR, "MNLI"),
    'sst': os.path.join(DATA_DIR, "SST-2"),
    'mrpc': os.path.join(DATA_DIR, "MRPC")
    }


# Training parameters
BATCH_SIZE = 16
MULTI_GPU_ON = False
MAX_SEQ_LEN = 128
NUM_EPOCHS = 5

In [4]:
TASK_DATA_DIRS

{'qqp': '../../../glue_data/QQP',
 'mnli': '../../../glue_data/MNLI',
 'sst': '../../../glue_data/SST-2',
 'mrpc': '../../../glue_data/MRPC'}

Exploring the location for our data to be downloaded, model to be checkpointed and logs to be dumped

In [5]:
print(OUTPUT_DIR)
print(LOG_DIR)

/tmp/tmpkl3fv4wv/checkpoint
/tmp/tmpkl3fv4wv/tensorboard_logdir


### Define a Configuration Object 

Create a model configuration object, `MTDNNConfig`, with the necessary parameters to initialize the MT-DNN model. Initialization without any parameters will default to a similar configuration that initializes a BERT model. 

In [6]:
config = MTDNNConfig(batch_size=BATCH_SIZE, 
                     max_seq_len=MAX_SEQ_LEN, 
                     multi_gpu_on=MULTI_GPU_ON)


### Create Task Definition Object  

Define the task parameters to train for and initialize an `MTDNNTaskDefs` object. Create a task parameter dictionary. Definition can be a single or multiple tasks to train.  `MTDNNTaskDefs` can take a python dict, yaml or json file with task(s) defintion.

The data source directory is the path of data downloaded and extracted above using `download_tsv_files_and_extract` which is the `MNLI` dir under the `DATA_DIR` temporary directory.    

The data source has options that are set to drive each task pre-processing; `data_process_opts`


In [28]:
default_data_process_opts = {"header": True, "is_train": True, "multi_snli": False,}
default_split_names = ["train", "dev", "test"]
tasks_params = {
#     "mnli": {
#         "data_format": "PremiseAndOneHypothesis",
#         "encoder_type": "BERT",
#         "dropout_p": 0.3,
#         "enable_san": True,
#         "labels": ["contradiction", "neutral", "entailment"],
#         "metric_meta": ["ACC"],
#         "loss": "CeCriterion",
#         "kd_loss": "MseCriterion",
#         "n_class": 3,
#         "split_names": [
#             "train",
#             "dev_matched",
#             "dev_mismatched",
#             "test_matched",
#             "test_mismatched",
#         ],
#         "data_source_dir": TASK_DATA_DIRS['mnli'],
#         "data_process_opts": {"header": True, "is_train": True, "multi_snli": False,},
#         "task_type": "Classification",
#     },
    "mrpc": {
                "task_name": "mrpc",
                "data_format": "PremiseAndOneHypothesis",
                "encoder_type": "BERT",
                "enable_san": True,
                "metric_meta": ["ACC", "F1"],
                "loss": "CeCriterion",
                "kd_loss": "MseCriterion",
                "n_class": 2,
                "split_names": default_split_names,
                "data_source_dir": TASK_DATA_DIRS['mrpc'],
                "data_process_opts": default_data_process_opts,
                "task_type": "Classification",
            },
    "sst": {
                "task_name": "sst",
                "data_format": "PremiseOnly",
                "encoder_type": "BERT",
                "enable_san": False,
                "metric_meta": ["ACC"],
                "loss": "CeCriterion",
                "kd_loss": "MseCriterion",
                "n_class": 2,
                "split_names": default_split_names,
                "data_source_dir": TASK_DATA_DIRS['sst'],
                "data_process_opts": default_data_process_opts,
                "task_type": "Classification",
            },
}

# Define the tasks
task_defs = MTDNNTaskDefs(tasks_params)

11/02/2020 11:53:21 - mtdnn.tasks.config - INFO - Mapping Task attributes
11/02/2020 11:53:21 - mtdnn.tasks.config - INFO - Mapping Task attributes
11/02/2020 11:53:21 - mtdnn.tasks.config - INFO - Configured task definitions - ['mrpc', 'sst']



### Create the MTDNN Data Tokenizer Object  

Create a data tokenizing object, `MTDNNTokenizer`. Based on the model initial checkpoint, it wraps around the model's Huggingface transformers library to encode the data to MT-DNN format. This becomes the input to the data building stage.  


In [8]:
tokenizer = MTDNNTokenizer(do_lower_case=True)

#### Testing out the Tokenizer encode function on a sample text
`tokenizer.encode("What NLP toolkit do you recommend", "MT-DNN is a fantastic toolkit")`

In [9]:
# single sentence
tokenizer.encode("What NLP toolkit do you recommend")

([101, 2054, 17953, 2361, 6994, 23615, 2079, 2017, 16755, 102, 102],
 None,
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])

In [10]:
# sentence pair
print(tokenizer.encode("What NLP toolkit do you recommend", "MT-DNN is a fantastic toolkit"))

([101, 2054, 17953, 2361, 6994, 23615, 2079, 2017, 16755, 102, 11047, 1011, 1040, 10695, 2003, 1037, 10392, 6994, 23615, 102], None, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])


## Data Preprocessing

### Create the Data Builder Object  

Create a data preprocessing object, `MTDNNDataBuilder`. This class is responsible for converting the data into the MT-DNN format depending on the task.  
 

Define a data builder that handles the creating of each task's vectorized data utilizing the model tokenizer. This will build out the vectorized data needed for creating the training, test and development PyTorch dataloaders

In [29]:
task_defs.data_paths_map

{'mrpc': {'data_paths': ['../../../glue_data/MRPC/train.tsv',
   '../../../glue_data/MRPC/dev.tsv',
   '../../../glue_data/MRPC/test.tsv'],
  'data_opts': {'header': True, 'is_train': True, 'multi_snli': False}},
 'sst': {'data_paths': ['../../../glue_data/SST-2/train.tsv',
   '../../../glue_data/SST-2/dev.tsv',
   '../../../glue_data/SST-2/test.tsv'],
  'data_opts': {'header': True, 'is_train': True, 'multi_snli': False}}}

In [30]:
## Load and build data
data_builder = MTDNNDataBuilder(
    tokenizer=tokenizer,
    task_defs=task_defs,
    data_dir='.', #DATA_SOURCE_DIR,
    canonical_data_suffix="canonical_data",
    dump_rows=True,
)

## Build data to MTDNN Format
## Iterable of each specific task and processed data
vectorized_data = data_builder.vectorize()

11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 3668 samples for mrpc at ./canonical_data/mrpc_train.tsv
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 408 samples for mrpc at ./canonical_data/mrpc_dev.tsv
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 1725 samples for mrpc at ./canonical_data/mrpc_test.tsv
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 8000 samples for sst at ./canonical_data/sst_train.tsv
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 2000 samples for sst at ./canonical_data/sst_dev.tsv
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 1821 samples for sst at ./canonical_data/sst_test.tsv
mrpc_train
11/02/2020 11:53:30 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MRPC TRAIN' Task


Building Data For Premise and One Hypothesis: 3668it [00:03, 1034.82it/s]

11/02/2020 11:53:34 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/mrpc_train.json



Saving Data For PremiseAndOneHypothesis: 100%|██████████| 3668/3668 [00:00<00:00, 33142.13it/s]

mrpc_dev
11/02/2020 11:53:34 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MRPC DEV' Task



Building Data For Premise and One Hypothesis: 408it [00:00, 1013.03it/s]

11/02/2020 11:53:34 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/mrpc_dev.json



Saving Data For PremiseAndOneHypothesis: 100%|██████████| 408/408 [00:00<00:00, 28435.96it/s]

mrpc_test
11/02/2020 11:53:34 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MRPC TEST' Task



Building Data For Premise and One Hypothesis: 1725it [00:01, 1016.37it/s]

11/02/2020 11:53:36 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/mrpc_test.json



Saving Data For PremiseAndOneHypothesis: 100%|██████████| 1725/1725 [00:00<00:00, 31342.67it/s]

sst_train
11/02/2020 11:53:36 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'SST TRAIN' Task



Building Data For Premise Only: 8000it [00:02, 3489.78it/s]

11/02/2020 11:53:38 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/sst_train.json



Saving Data For PremiseOnly: 100%|██████████| 8000/8000 [00:00<00:00, 76292.15it/s]

sst_dev
11/02/2020 11:53:38 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'SST DEV' Task



Building Data For Premise Only: 2000it [00:00, 2648.57it/s]

11/02/2020 11:53:39 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/sst_dev.json



Saving Data For PremiseOnly: 100%|██████████| 2000/2000 [00:00<00:00, 61990.44it/s]

sst_test
11/02/2020 11:53:39 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'SST TEST' Task



Building Data For Premise Only: 1821it [00:01, 1782.55it/s]

11/02/2020 11:53:40 - mtdnn.data_builder_mtdnn - INFO - Saving data to ./canonical_data/bert_base_uncased/sst_test.json



Saving Data For PremiseOnly: 100%|██████████| 1821/1821 [00:00<00:00, 46184.06it/s]


### Create the Data Processing Object  

Create a data preprocessing object, `MTDNNDataProcess`. This creates the training, test and development PyTorch dataloaders needed for training and testing. We also need to retrieve the necessary training options required to initialize the model correctly, for all tasks.  

Define a data process that handles creating the training, test and development PyTorch dataloaders

In [31]:
# Make the Data Preprocess step and update the config with training data updates
data_processor = MTDNNDataProcess(
    config=config, task_defs=task_defs, vectorized_data=vectorized_data
)

11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Starting to process the training data sets
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Loading mrpc_train as task 0
11/02/2020 11:53:48 - mtdnn.dataset_mtdnn - INFO - Loaded 3668 samples out of 3668
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Loading sst_train as task 1
11/02/2020 11:53:48 - mtdnn.dataset_mtdnn - INFO - Loaded 8000 samples out of 8000
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Starting to process the testing data sets
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Loading mrpc_dev as task 0
11/02/2020 11:53:48 - mtdnn.dataset_mtdnn - INFO - Loaded 408 samples out of 408
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Loading mrpc_test as task 0
11/02/2020 11:53:48 - mtdnn.dataset_mtdnn - INFO - Loaded 1725 samples out of 1725
11/02/2020 11:53:48 - mtdnn.process_mtdnn - INFO - Loading sst_dev as task 1
11/02/2020 11:53:48 - mtdnn.dataset_mtdnn - INFO - Loaded 2000 samples out of 2000
11/02/2

Retrieve the processed batch multitask batch data loaders for training, development and test

In [32]:
multitask_train_dataloader = data_processor.get_train_dataloader()
dev_dataloaders_list = data_processor.get_dev_dataloaders()
test_dataloaders_list = data_processor.get_test_dataloaders()

Now we can retrieve the training options, from the processor, to initialize model with.

In [15]:
decoder_opts = data_processor.get_decoder_options_list()
task_types = data_processor.get_task_types_list()
dropout_list = data_processor.get_tasks_dropout_prob_list()
loss_types = data_processor.get_loss_types_list()
kd_loss_types = data_processor.get_kd_loss_types_list()
tasks_nclass_list = data_processor.get_task_nclass_list()

Let us update the batch steps

In [16]:
num_all_batches = data_processor.get_num_all_batches()
num_all_batches

3650

### Instantiate the MTDNN Model

Now we can go ahead and create an `MTDNNModel` model

In [19]:
model = MTDNNModel(
    config,
    task_defs,
    pretrained_model_name="bert-base-uncased",
    num_train_step=num_all_batches,
    decoder_opts=decoder_opts,
    task_types=task_types,
    dropout_list=dropout_list,
    loss_types=loss_types,
    kd_loss_types=kd_loss_types,
    tasks_nclass_list=tasks_nclass_list,
    multitask_train_dataloader=multitask_train_dataloader,
    dev_dataloaders_list=dev_dataloaders_list,
    test_dataloaders_list=test_dataloaders_list,
    output_dir=OUTPUT_DIR,
    log_dir=LOG_DIR 
)

idx: 0, number of task labels: 2
idx: 1, number of task labels: 2


## Model Finetuning, Prediction and Evaluation

### Fit and finetune model on five epochs and predict using the training and test  

At this point the MT-DNN model allows us to fit to the model and create predictions. The fit takes an optional `epochs` parameter that overwrites the epochs set in the `MTDNNConfig` object. 

In [20]:
model.fit(epochs=NUM_EPOCHS)

11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Total number of params: 109485316
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Total number of params: 109485316
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - At epoch 0
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - At epoch 0
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 730
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 730
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 1] Updates - [     1] Training Loss - [0.66444] Time Remaining - [0:02:37]
11/02/2020 11:28:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 1] Updates - [     1] Training Loss - [0.66444] Time Remaining - [0:02:37]


	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  exp_avg.mul_(beta1).add_(1 - beta1, grad)


11/02/2020 11:30:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [   500] Training Loss - [0.51931] Time Remaining - [0:00:51]
11/02/2020 11:30:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [   500] Training Loss - [0.51931] Time Remaining - [0:00:51]
11/02/2020 11:31:37 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpkl3fv4wv/checkpoint/model_0.pt
11/02/2020 11:31:37 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpkl3fv4wv/checkpoint/model_0.pt
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpkl3fv4wv/checkpoint/model_0.pt
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpkl3fv4wv/checkpoint/model_0.pt
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - At epoch 1
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - At epoch 1
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 730
11/02/2020 11:31:43 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 

### Evaluation and Prediction
Perform inference using the last (best) checkpointed model. With 5 epochs, the last model would be `model_4.pt`

In [34]:
model.predict(trained_model_chckpt=f"{OUTPUT_DIR}/model_4.pt")

11/02/2020 11:54:02 - mtdnn.modeling_mtdnn - INFO - Running predictions using: /tmp/tmpkl3fv4wv/checkpoint/model_4.pt
11/02/2020 11:54:02 - mtdnn.modeling_mtdnn - INFO - Running predictions using: /tmp/tmpkl3fv4wv/checkpoint/model_4.pt
11/02/2020 11:54:02 - mtdnn.modeling_mtdnn - INFO - Running predictions using: /tmp/tmpkl3fv4wv/checkpoint/model_4.pt
11/02/2020 11:54:03 - mtdnn.modeling_mtdnn - INFO - predicting 0
11/02/2020 11:54:03 - mtdnn.modeling_mtdnn - INFO - predicting 0
11/02/2020 11:54:03 - mtdnn.modeling_mtdnn - INFO - predicting 0
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INFO - Task sst -- epoch 0 -- Dev ACC: 86.520
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INFO - Task sst -- epoch 0 -- Dev ACC: 86.520
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INFO - Task sst -- epoch 0 -- Dev ACC: 86.520
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INFO - predicting 0
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INFO - predicting 0
11/02/2020 11:54:05 - mtdnn.modeling_mtdnn - INF

### Results

In [35]:
results = {}
dev_result_files = list(filter(lambda x: x.endswith('.json') and 'dev' in x, os.listdir(OUTPUT_DIR))) 
for d in dev_result_files: 
    name =  ' '.join(list(map(str.capitalize, d.split('_')))[:3]) 
    file_name = os.path.join(OUTPUT_DIR, d)
    with open(file_name, 'r') as f: 
        res = json.load(f) 
        results.update(
            {name: {
                'ACCURACY': f"{res['metrics']['ACC']:.3f}"
                }
            }) 
df_results = pd.DataFrame(results)   
df_results

Unnamed: 0,Mrpc Dev Scores,Sst Dev Scores
ACCURACY,90.8,86.52


## Clean up temporary folders

In [None]:
if os.path.exists(ROOT_DIR):
    shutil.rmtree(ROOT_DIR, ignore_errors=True)