[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ethanjperez/rda/blob/master/rda.ipynb)

# **Tutorial**: How to run Rissanen Data Analysis on your own dataset

This short notebook shows how you can run Rissanen Data Analysis (RDA) on any dataset, using any model of your choice. To perform RDA, you'll need to compute the Minimum Description Length (MDL) of the labels of your dataset given the inputs. Then, you can modify the dataset inputs (i.e., adding or removing certain features, like nouns) and see how the MDL changes. In this notebook, we'll illustrate how to compute MDL of a dataset in the GLUE benchmark, MRPC, by training BERT models, using the HuggingFace Transformers library.

Throughout this tutotial, we'll show you where you need to change the code in order to evaluate the MDL of a different dataset and/or using a different model. You'll only need to change a few lines of code, so it will be quite straightforward. Let's get started!

## Import Dependencies
First, let's import a few basic packages:

In [1]:
import json
import math
import os
import random
import sys

Next, we'll load some library to train models (with PyTorch, Tensorflow, Jax, or anything else). We'll call some model training function from the library (treating it like a black box), in order to train and test on different subsets of the original dataset (which we'll write to file first). We just need the model training function to return us the loss on unseen test examples. The loss value should be the mean squared error for regression tasks and negative log-likelihood otherwise -- we'll use these values to compute the label description lengths.

Here, we train BERT models using HuggingFace Transformers. We've cloned the original repo and modified ~3 lines of code to have the model training function return the test loss after training. You can load this model training function like so:

In [2]:
!pip install datasets
!pip install git+https://github.com/ethanjperez/transformers_rda.git
import transformers.run_glue as train_model

Collecting datasets
[?25l  Downloading https://files.pythonhosted.org/packages/3e/73/742d17d8a9a1c639132affccc9250f0743e484cbf263ede6ddcbe34ef212/datasets-1.4.1-py3-none-any.whl (186kB)
[K     |█▊                              | 10kB 21.1MB/s eta 0:00:01[K     |███▌                            | 20kB 25.7MB/s eta 0:00:01[K     |█████▎                          | 30kB 21.5MB/s eta 0:00:01[K     |███████                         | 40kB 25.1MB/s eta 0:00:01[K     |████████▊                       | 51kB 20.9MB/s eta 0:00:01[K     |██████████▌                     | 61kB 17.4MB/s eta 0:00:01[K     |████████████▎                   | 71kB 16.1MB/s eta 0:00:01[K     |██████████████                  | 81kB 17.3MB/s eta 0:00:01[K     |███████████████▉                | 92kB 16.2MB/s eta 0:00:01[K     |█████████████████▌              | 102kB 17.4MB/s eta 0:00:01[K     |███████████████████▎            | 112kB 17.4MB/s eta 0:00:01[K     |█████████████████████           | 122kB 17

## Specify RDA Experimental Setup
Next, let's set a few basic variables (with assertion checks) to set how we'll compute RDA:

In [3]:
# The number of blocks N we use when sending labels
# We train N-1 models, since the first block is sent with a uniform prior
num_blocks = 9
assert num_blocks >= 1, 'num_blocks must be >= 1'

# The minimum/maximum number of examples to train models with
min_num_train_samples = 64
max_num_train_samples = float('inf') # use all examples
assert min_num_train_samples >= 1, 'min_num_train_samples must be >= 1'
assert max_num_train_samples >= min_num_train_samples, 'max_num_train_samples must be >= --min_num_train_samples'

# The fraction of examples to split off for validation (the rest are used for training)
val_frac = 0.1
assert 0 <= val_frac < 1, 'val_frac must be >= 0 and < 1'

Now, let's set a few task-specific variables. We need to know the range of possible output values, in order to compute the codelength for sending the first block of labels. For MRPC, there are just two possible labels:

In [4]:
label_range = 2
assert label_range > 0, 'label_range must be > 0'

uniform_prior_nll = -math.log(1. / float(label_range))

For text generation or span prediction, `label_range` will be quite large, as there are many possible outputs. For regression, you'll want to set `label_range` to the size of the interval over which outputs can range, e.g., 3.5 if the range is [1., 4.5]. Speaking of regression, let's set a variable to keep track of whether or not we expect to receive mean squared error values from our model training function (so we'll know to convert MSE values to negative log-likelihoods).

In [5]:
mse = False

Lastly, set the command line arguments you want to use to train your model. We'll need to point the model to the training/validation/test data for each block, but that will change for each block. For now, we can just set those paths to special string `TRAIN_FILE`, `VALIDATION_FILE`, and `TEST_FILE` (we'll replace these with actual file paths later). Here's what this looks like for training a HuggingFace BERT model:

In [6]:
training_args = "--model_name_or_path bert-base-cased --do_train --do_eval --max_seq_length 128 " + \
    "--per_device_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3 --output_dir checkpoint " + \
    "--train_file TRAIN_FILE --validation_file VALIDATION_FILE --test_file TEST_FILE --overwrite_output_dir"

Note that `--overwrite_output_dir` will clear the saved results of any previous model training run that saves to the same `--output_dir` (here, `checkpoint/`). This is the behavior that we want, since when we send a new block, we don't want to reload the model, results, etc. for sending an earlier block of data. If you call a different model training function, you'll similarly want to ensure that you aren't accidentally loading the results of previous training runs when you make a new call to the model training function.

## Dataset Setup

Now that we've specified our RDA setup, let's load our dataset into a list of examples (these can have any data type/structure). We load the MRPC data (just the training set for MDL evaluation) like so:

In [7]:
from datasets import load_dataset
dataset = list(load_dataset("glue", "mrpc", split='train'))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=7826.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=4473.0, style=ProgressStyle(description…


Downloading and preparing dataset glue/mrpc (download: 1.43 MiB, generated: 1.43 MiB, post-processed: Unknown size, total: 2.85 MiB) to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/7c99657241149a24692c402a5c3f34d4c9f1df5ac2e4c3759fadea38f6cb29c4...


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Downloading', max=1.0, style=ProgressSt…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Downloading', max=1.0, style=ProgressSt…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Downloading', max=1.0, style=ProgressSt…




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/7c99657241149a24692c402a5c3f34d4c9f1df5ac2e4c3759fadea38f6cb29c4. Subsequent calls will reuse this data.


At this point, you can augment or ablate the input data loaded above if desired. We'll just show how to compute MDL on the original MRPC dataset, so we don't do any input modification here.

Next, let's randomly order the dataset examples using a fixed random seed:

In [8]:
seed = 0
rng = random.Random(seed)
rng.shuffle(dataset)

To train the model on different subsets of the data, we'll need a function that saves a list of examples to file, in a format that the model training function can read from. Later, we can point the model training function to different training/validation/test data files as needed, for different blocks. Below, we save instances in a way that is compatible with loading data via HuggingFace datasets:

In [9]:
def save_data(examples, save_file):
    with open(save_file, 'w') as f:
        f.writelines('\n'.join([json.dumps(ex) for ex in examples]))

Let's also set the file extension that we want to use to save data files:

In [10]:
data_file_ext = 'json'

Now, let's compute the starting indices of each block, $t_0, \dots, t_N$:

In [11]:
block_size_logscale_increment = (math.log(min(len(dataset), max_num_train_samples)) - math.log(min_num_train_samples)) / (num_blocks - 1)
block_start_idxs = [0] + [int(round(math.exp(math.log(min_num_train_samples) + (block * block_size_logscale_increment)))) for block in range(num_blocks)]
print('t_0, ..., t_N:', block_start_idxs)

t_0, ..., t_N: [0, 64, 106, 176, 292, 485, 804, 1333, 2211, 3668]


## Computing Negative Log-Likelihoods

Now, we're ready to compute the average negative log-likelihoods (NLLs) for each block.
Let's collect the average NLL for each block in a list, adding the NLL for the first block that we computed earlier:

In [12]:
nlls = [uniform_prior_nll]

Now, we create train/val/test splits for sending each data block after the first, and then send each block one by one by calling `train_model.main()` to train a model on a chunk of the data and get the test loss on a new block.

In [13]:
# Send each block after the first, one by one
for send_block in range(1, num_blocks):
    # Create the train/validation/test data for sending each block
    train_val_dataset = dataset[:block_start_idxs[send_block]] # train/val examples from blocks before the current one
    rng.shuffle(train_val_dataset) # shuffle examples for random train vs. val split
    val_size = int(round(val_frac * len(train_val_dataset))) # compute size of validation set
    block_datasets = { # get list of examples for each split
        'train': train_val_dataset[val_size:],
        'validation': train_val_dataset[:val_size],
        'test': dataset[block_start_idxs[send_block]: block_start_idxs[send_block + 1]],
    }

    # Save train/validation/test data and add data paths to model training arguments
    block_data_dir = 'data/send_block_' + str(send_block) # where we'll save the train/val/test data for sending the current block
    os.makedirs(block_data_dir, exist_ok=True)
    block_training_args = training_args
    for split, block_dataset in block_datasets.items():
        block_split_filepath = os.path.join(block_data_dir, split + '.' + data_file_ext)
        print('Saving data to:', block_split_filepath)
        # Save the data for a single split of data (train, val, or test)
        save_data(block_dataset, block_split_filepath)
        assert (split.upper() + '_FILE') in training_args, 'Expected ' + split.upper() + '_FILE in training_args'
        # Point training arguments to this block's train/val/test data
        block_training_args = block_training_args.replace(split.upper() + '_FILE', block_split_filepath)

    # Set command line args for model training
    sys.argv = [train_model.__file__] + block_training_args.split()
    # Call main function to train model with above args, to get test NLL on this block
    block_nll = train_model.main()
    nlls.append(block_nll)

Saving data to: data/send_block_1/train.json
Saving data to: data/send_block_1/validation.json
Saving data to: data/send_block_1/test.json


Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
[INFO|run_glue.py:198] 2021-03-06 17:57:25,107 >> Training/evaluation parameters TrainingArguments(output_dir=checkpoint, overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=EvaluationStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=32, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=2e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Mar06_17-57-25_a65b076a9074, logging_strategy=LoggingStrategy.STEPS, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_core

Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-63e0d357392494ff/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-63e0d357392494ff/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|file_utils.py:1327] 2021-03-06 17:57:25,423 >> https://huggingface.co/bert-base-cased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmprn2z2ni8


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…

[INFO|file_utils.py:1331] 2021-03-06 17:57:25,467 >> storing https://huggingface.co/bert-base-cased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|file_utils.py:1334] 2021-03-06 17:57:25,469 >> creating metadata file for /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:457] 2021-03-06 17:57:25,470 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:57:25,472 >> Model config BertConfig {
  "architectures": [
    "




[INFO|configuration_utils.py:493] 2021-03-06 17:57:25,501 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|file_utils.py:1327] 2021-03-06 17:57:25,534 >> https://huggingface.co/bert-base-cased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpw6t9je1m


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…

[INFO|file_utils.py:1331] 2021-03-06 17:57:25,613 >> storing https://huggingface.co/bert-base-cased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/6508e60ab3c1200bffa26c95f4b58ac6b6d95fba4db1f195f632fa3cd7bc64cc.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
[INFO|file_utils.py:1334] 2021-03-06 17:57:25,614 >> creating metadata file for /root/.cache/huggingface/transformers/6508e60ab3c1200bffa26c95f4b58ac6b6d95fba4db1f195f632fa3cd7bc64cc.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
[INFO|file_utils.py:1327] 2021-03-06 17:57:25,649 >> https://huggingface.co/bert-base-cased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpf5klj6ul





HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…

[INFO|file_utils.py:1331] 2021-03-06 17:57:25,720 >> storing https://huggingface.co/bert-base-cased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/226a307193a9f4344264cdc76a12988448a25345ba172f2c7421f3b6810fddad.3dab63143af66769bbb35e3811f75f7e16b2320e12b7935e216bd6159ce6d9a6
[INFO|file_utils.py:1334] 2021-03-06 17:57:25,721 >> creating metadata file for /root/.cache/huggingface/transformers/226a307193a9f4344264cdc76a12988448a25345ba172f2c7421f3b6810fddad.3dab63143af66769bbb35e3811f75f7e16b2320e12b7935e216bd6159ce6d9a6
[INFO|tokenization_utils_base.py:1716] 2021-03-06 17:57:25,722 >> loading file https://huggingface.co/bert-base-cased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/6508e60ab3c1200bffa26c95f4b58ac6b6d95fba4db1f195f632fa3cd7bc64cc.437aa611e89f6fc6675a049d2b5545390adbc617e7d655286421c191d2be2791
[INFO|tokenization_utils_base.py:1716] 2021-03-06 17:57:25,723 >> loading file https://huggingface.co/bert-base-cased/res




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…

[INFO|file_utils.py:1331] 2021-03-06 17:57:33,590 >> storing https://huggingface.co/bert-base-cased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/092cc582560fc3833e556b3f833695c26343cb54b7e88cd02d40821462a74999.1f48cab6c959fc6c360d22bea39d06959e90f5b002e77e836d2da45464875cda
[INFO|file_utils.py:1334] 2021-03-06 17:57:33,591 >> creating metadata file for /root/.cache/huggingface/transformers/092cc582560fc3833e556b3f833695c26343cb54b7e88cd02d40821462a74999.1f48cab6c959fc6c360d22bea39d06959e90f5b002e77e836d2da45464875cda
[INFO|modeling_utils.py:1035] 2021-03-06 17:57:33,592 >> loading weights file https://huggingface.co/bert-base-cased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/092cc582560fc3833e556b3f833695c26343cb54b7e88cd02d40821462a74999.1f48cab6c959fc6c360d22bea39d06959e90f5b002e77e836d2da45464875cda





- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:57:37,446 >> Sample 40 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 612, 'input_ids': [101, 2831, 4312, 1644, 117, 1122, 1110, 5696, 1111, 1482, 1223, 1103, 1425, 1104, 1479, 1106, 1138, 3785, 4125, 119, 102, 1130, 1103, 4893, 117, 148, 2568, 2382, 1115, 1122, 1110, 5696, 1111, 1482, 1115, 1685, 1106, 1138, 3785, 4125, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0




[INFO|trainer.py:484] 2021-03-06 17:57:47,639 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 17:57:47,641 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 17:57:47,877 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 17:57:47,878 >>   Num examples = 58
[INFO|trainer.py:936] 2021-03-06 17:57:47,884 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 17:57:47,887 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 17:57:47,888 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 17:57:47,893 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 17:57:47,896 >

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 17:57:51,736 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 17:57:51,949 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 17:57:51,952 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 17:57:53,705 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 17:57:53,708 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 17:57:53,708 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 17:57:53,741 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 17:57:53,742 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 17:57:53,743 >>   init_mem_cpu_alloc_delta = 344712
[INFO|run_glue.py:425] 2021-03-06 17:57:53,743 >>   init_mem_cpu_peaked_del

[INFO|run_glue.py:449] 2021-03-06 17:57:54,388 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 17:57:54,389 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 17:57:54,389 >>   eval_accuracy = 0.7142857313156128
[INFO|run_glue.py:451] 2021-03-06 17:57:54,395 >>   eval_loss = 0.592985987663269
[INFO|run_glue.py:451] 2021-03-06 17:57:54,399 >>   eval_mem_cpu_alloc_delta = 87950
[INFO|run_glue.py:451] 2021-03-06 17:57:54,401 >>   eval_mem_cpu_peaked_delta = 18278
[INFO|run_glue.py:451] 2021-03-06 17:57:54,403 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 17:57:54,404 >>   eval_mem_gpu_peaked_delta = 34635776
[INFO|run_glue.py:451] 2021-03-06 17:57:54,409 >>   eval_runtime = 0.3896
[INFO|run_glue.py:451] 2021-03-06 17:57:54,410 >>   eval_samples_per_second = 107.812
[INFO|training_args.py:610] 2021-03-06 17:57:54,432 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 17:57:54,434 >> The default value for the training argument `--r

Saving data to: data/send_block_2/train.json
Saving data to: data/send_block_2/validation.json
Saving data to: data/send_block_2/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-4d32bd19c4eceb1a/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-4d32bd19c4eceb1a/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 17:57:54,754 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:57:54,755 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:57:58,468 >> Sample 81 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 2720, 'input_ids': [101, 1252, 1103, 4201, 4181, 2341, 8128, 5323, 10378, 112, 188, 6109, 1104, 1115, 21100, 119, 102, 1130, 1157, 2592, 117, 1103, 2880, 5707, 3914, 170, 4832, 4063, 5790, 1828, 119, 10378, 1104, 4917, 1104, 3995, 1158, 4810, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,




[INFO|trainer.py:484] 2021-03-06 17:57:58,843 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 17:57:58,845 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 17:57:59,079 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 17:57:59,080 >>   Num examples = 95
[INFO|trainer.py:936] 2021-03-06 17:57:59,081 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 17:57:59,082 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 17:57:59,083 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 17:57:59,084 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 17:57:59,085 >

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 17:58:04,812 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 17:58:05,014 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 17:58:05,017 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 17:58:06,343 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 17:58:06,346 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 17:58:06,347 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 17:58:06,385 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 17:58:06,386 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 17:58:06,386 >>   init_mem_cpu_alloc_delta = 51555
[INFO|run_glue.py:425] 2021-03-06 17:58:06,388 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 17:58:07,178 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 17:58:07,179 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 17:58:07,180 >>   eval_accuracy = 0.8142856955528259
[INFO|run_glue.py:451] 2021-03-06 17:58:07,182 >>   eval_loss = 0.5178131461143494
[INFO|run_glue.py:451] 2021-03-06 17:58:07,184 >>   eval_mem_cpu_alloc_delta = 59128
[INFO|run_glue.py:451] 2021-03-06 17:58:07,187 >>   eval_mem_cpu_peaked_delta = 18278
[INFO|run_glue.py:451] 2021-03-06 17:58:07,188 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 17:58:07,193 >>   eval_mem_gpu_peaked_delta = 34635776
[INFO|run_glue.py:451] 2021-03-06 17:58:07,196 >>   eval_runtime = 0.4953
[INFO|run_glue.py:451] 2021-03-06 17:58:07,197 >>   eval_samples_per_second = 141.319
[INFO|training_args.py:610] 2021-03-06 17:58:07,233 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 17:58:07,236 >> The default value for the training argument `--

Saving data to: data/send_block_3/train.json
Saving data to: data/send_block_3/validation.json
Saving data to: data/send_block_3/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-ca938a1b2e2d9c6f/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-ca938a1b2e2d9c6f/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 17:58:07,605 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:58:07,606 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:58:11,363 >> Sample 28 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 759, 'input_ids': [101, 1109, 5626, 1338, 1170, 158, 119, 156, 119, 1574, 5274, 2499, 155, 119, 1537, 1107, 5154, 1392, 4741, 1314, 1989, 1115, 1103, 143, 9481, 10778, 3748, 1106, 1576, 1103, 25097, 119, 102, 158, 119, 156, 119, 1574, 5274, 2499, 155, 119, 1537, 4741, 9667, 1107, 5154, 1392, 1115, 1103, 143, 9481, 14756, 3748, 1106, 1576, 1103, 25097, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,




[INFO|trainer.py:484] 2021-03-06 17:58:11,738 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 17:58:11,739 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 17:58:11,971 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 17:58:11,972 >>   Num examples = 158
[INFO|trainer.py:936] 2021-03-06 17:58:11,973 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 17:58:11,974 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 17:58:11,975 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 17:58:11,976 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 17:58:11,977 

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 17:58:21,499 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 17:58:21,706 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 17:58:21,711 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 17:58:23,154 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 17:58:23,157 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 17:58:23,160 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 17:58:23,196 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 17:58:23,196 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 17:58:23,197 >>   init_mem_cpu_alloc_delta = 53541
[INFO|run_glue.py:425] 2021-03-06 17:58:23,198 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 17:58:24,294 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 17:58:24,295 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 17:58:24,296 >>   eval_accuracy = 0.6551724076271057
[INFO|run_glue.py:451] 2021-03-06 17:58:24,302 >>   eval_loss = 0.6485145688056946
[INFO|run_glue.py:451] 2021-03-06 17:58:24,303 >>   eval_mem_cpu_alloc_delta = 61364
[INFO|run_glue.py:451] 2021-03-06 17:58:24,305 >>   eval_mem_cpu_peaked_delta = 18278
[INFO|run_glue.py:451] 2021-03-06 17:58:24,306 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 17:58:24,307 >>   eval_mem_gpu_peaked_delta = 34636800
[INFO|run_glue.py:451] 2021-03-06 17:58:24,308 >>   eval_runtime = 0.8206
[INFO|run_glue.py:451] 2021-03-06 17:58:24,309 >>   eval_samples_per_second = 141.355
[INFO|training_args.py:610] 2021-03-06 17:58:24,366 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 17:58:24,367 >> The default value for the training argument `--

Saving data to: data/send_block_4/train.json
Saving data to: data/send_block_4/validation.json
Saving data to: data/send_block_4/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-d11346695d2e104a/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-d11346695d2e104a/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 17:58:24,737 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:58:24,738 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:58:28,540 >> Sample 57 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 3305, 'input_ids': [101, 2664, 3756, 1163, 5229, 1104, 7056, 14255, 4121, 3660, 1113, 1103, 1300, 26411, 1116, 12292, 117, 4717, 1106, 26499, 1147, 15346, 119, 102, 1740, 2664, 2103, 1346, 2052, 1115, 5229, 1104, 7056, 14255, 4121, 3660, 1113, 1300, 26411, 1116, 12292, 117, 4717, 1106, 26499, 1147, 15346, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0




[INFO|trainer.py:484] 2021-03-06 17:58:28,876 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 17:58:28,878 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 17:58:29,099 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 17:58:29,101 >>   Num examples = 263
[INFO|trainer.py:936] 2021-03-06 17:58:29,101 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 17:58:29,103 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 17:58:29,108 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 17:58:29,111 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 17:58:29,112 

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 17:58:45,176 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 17:58:45,388 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 17:58:45,396 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 17:58:46,922 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 17:58:46,924 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 17:58:46,930 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 17:58:46,967 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 17:58:46,968 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 17:58:46,969 >>   init_mem_cpu_alloc_delta = 54189
[INFO|run_glue.py:425] 2021-03-06 17:58:46,970 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 17:58:48,742 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 17:58:48,743 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 17:58:48,744 >>   eval_accuracy = 0.6632124185562134
[INFO|run_glue.py:451] 2021-03-06 17:58:48,749 >>   eval_loss = 0.6163193583488464
[INFO|run_glue.py:451] 2021-03-06 17:58:48,750 >>   eval_mem_cpu_alloc_delta = 61797
[INFO|run_glue.py:451] 2021-03-06 17:58:48,751 >>   eval_mem_cpu_peaked_delta = 28828
[INFO|run_glue.py:451] 2021-03-06 17:58:48,758 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 17:58:48,759 >>   eval_mem_gpu_peaked_delta = 34638336
[INFO|run_glue.py:451] 2021-03-06 17:58:48,760 >>   eval_runtime = 1.4865
[INFO|run_glue.py:451] 2021-03-06 17:58:48,761 >>   eval_samples_per_second = 129.831
[INFO|training_args.py:610] 2021-03-06 17:58:48,820 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 17:58:48,821 >> The default value for the training argument `--

Saving data to: data/send_block_5/train.json
Saving data to: data/send_block_5/validation.json
Saving data to: data/send_block_5/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-0e43fe145abe7692/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-0e43fe145abe7692/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 17:58:49,093 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:58:49,098 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:58:53,004 >> Sample 327 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 38, 'input_ids': [101, 11336, 11154, 1468, 147, 119, 140, 119, 9223, 2254, 3291, 119, 3561, 119, 113, 147, 13113, 114, 1105, 160, 1348, 24448, 3291, 119, 113, 22751, 2349, 114, 5642, 1614, 1228, 1113, 6356, 119, 102, 11336, 11154, 1468, 147, 119, 140, 119, 9223, 2254, 3291, 119, 3561, 119, 147, 13113, 119, 151, 1105, 160, 1348, 24448, 3291, 119, 22751, 2349, 119, 151, 5642, 1614, 1228, 1113, 6356, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,




[INFO|trainer.py:484] 2021-03-06 17:58:53,344 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 17:58:53,345 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 17:58:53,561 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 17:58:53,561 >>   Num examples = 437
[INFO|trainer.py:936] 2021-03-06 17:58:53,562 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 17:58:53,564 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 17:58:53,565 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 17:58:53,566 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 17:58:53,567 

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 17:59:20,395 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 17:59:20,613 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 17:59:20,619 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 17:59:22,081 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 17:59:22,083 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 17:59:22,085 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 17:59:22,136 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 17:59:22,137 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 17:59:22,140 >>   init_mem_cpu_alloc_delta = 54181
[INFO|run_glue.py:425] 2021-03-06 17:59:22,141 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 17:59:24,823 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 17:59:24,824 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 17:59:24,825 >>   eval_accuracy = 0.6990595459938049
[INFO|run_glue.py:451] 2021-03-06 17:59:24,834 >>   eval_loss = 0.5838158130645752
[INFO|run_glue.py:451] 2021-03-06 17:59:24,837 >>   eval_mem_cpu_alloc_delta = 63180
[INFO|run_glue.py:451] 2021-03-06 17:59:24,839 >>   eval_mem_cpu_peaked_delta = 32757
[INFO|run_glue.py:451] 2021-03-06 17:59:24,840 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 17:59:24,847 >>   eval_mem_gpu_peaked_delta = 34640896
[INFO|run_glue.py:451] 2021-03-06 17:59:24,850 >>   eval_runtime = 2.3904
[INFO|run_glue.py:451] 2021-03-06 17:59:24,851 >>   eval_samples_per_second = 133.452
[INFO|training_args.py:610] 2021-03-06 17:59:24,933 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 17:59:24,935 >> The default value for the training argument `--

Saving data to: data/send_block_6/train.json
Saving data to: data/send_block_6/validation.json
Saving data to: data/send_block_6/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-df3adf3e4bf36b30/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-df3adf3e4bf36b30/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 17:59:25,248 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 17:59:25,252 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 17:59:29,300 >> Sample 654 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 2475, 'input_ids': [101, 1109, 7141, 16023, 117, 1680, 9031, 1170, 1750, 1190, 1300, 2005, 1104, 3687, 24851, 1891, 117, 1723, 170, 123, 122, 120, 123, 1989, 3443, 117, 1219, 1134, 160, 22118, 8376, 1197, 2533, 1471, 119, 102, 1109, 3613, 10774, 1723, 170, 123, 122, 120, 123, 1989, 3443, 117, 1219, 1134, 1103, 159, 25191, 2758, 1391, 1299, 2533, 1471, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0




[INFO|run_glue.py:365] 2021-03-06 17:59:29,302 >> Sample 114 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 1776, 'input_ids': [101, 2543, 1550, 14619, 117, 3306, 1118, 2124, 1231, 24191, 1197, 24664, 3491, 1161, 13411, 117, 1108, 1106, 1129, 8243, 2135, 170, 2124, 24096, 117, 13114, 22515, 15748, 1116, 119, 102, 138, 2124, 24096, 117, 13114, 22515, 15748, 1116, 117, 1108, 1106, 1129, 8243, 1114, 1330, 1550, 14619, 117, 3306, 1118, 2124, 1231, 24191, 1197, 24664, 3491, 1161, 13411, 117, 1107, 1103, 4427, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 18:00:15,250 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 18:00:15,465 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 18:00:15,469 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 18:00:16,647 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 18:00:16,649 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 18:00:16,651 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 18:00:16,698 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 18:00:16,700 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 18:00:16,701 >>   init_mem_cpu_alloc_delta = 55869
[INFO|run_glue.py:425] 2021-03-06 18:00:16,702 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 18:00:21,107 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 18:00:21,108 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 18:00:21,110 >>   eval_accuracy = 0.7145557403564453
[INFO|run_glue.py:451] 2021-03-06 18:00:21,112 >>   eval_loss = 0.5685139894485474
[INFO|run_glue.py:451] 2021-03-06 18:00:21,113 >>   eval_mem_cpu_alloc_delta = 65966
[INFO|run_glue.py:451] 2021-03-06 18:00:21,114 >>   eval_mem_cpu_peaked_delta = 45987
[INFO|run_glue.py:451] 2021-03-06 18:00:21,115 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 18:00:21,115 >>   eval_mem_gpu_peaked_delta = 34646016
[INFO|run_glue.py:451] 2021-03-06 18:00:21,116 >>   eval_runtime = 4.0577
[INFO|run_glue.py:451] 2021-03-06 18:00:21,118 >>   eval_samples_per_second = 130.37
[INFO|training_args.py:610] 2021-03-06 18:00:21,260 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 18:00:21,262 >> The default value for the training argument `--r

Saving data to: data/send_block_7/train.json
Saving data to: data/send_block_7/validation.json
Saving data to: data/send_block_7/test.json
Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-c76ed36f560ac654/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-c76ed36f560ac654/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 18:00:21,684 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 18:00:21,686 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 18:00:25,850 >> Sample 228 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 1640, 'input_ids': [101, 107, 1188, 1110, 3908, 6944, 2983, 11486, 2698, 117, 107, 1163, 3895, 144, 13703, 117, 1126, 14582, 1120, 140, 27954, 1658, 1291, 28023, 119, 102, 1220, 1132, 2140, 1280, 1106, 1647, 3813, 1111, 159, 1883, 2599, 2007, 1142, 1214, 117, 107, 1163, 3895, 144, 13703, 117, 1126, 14582, 1120, 140, 27954, 1658, 1291, 28023, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,




[INFO|trainer.py:484] 2021-03-06 18:00:26,220 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 18:00:26,222 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 18:00:26,477 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 18:00:26,478 >>   Num examples = 1200
[INFO|trainer.py:936] 2021-03-06 18:00:26,479 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 18:00:26,488 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 18:00:26,489 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 18:00:26,496 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 18:00:26,497

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 18:01:43,294 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 18:01:43,530 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 18:01:43,534 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 18:01:44,971 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 18:01:44,976 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 18:01:44,977 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 18:01:45,023 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 18:01:45,024 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 18:01:45,025 >>   init_mem_cpu_alloc_delta = 53893
[INFO|run_glue.py:425] 2021-03-06 18:01:45,027 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 18:01:52,295 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 18:01:52,296 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 18:01:52,296 >>   eval_accuracy = 0.7471526265144348
[INFO|run_glue.py:451] 2021-03-06 18:01:52,305 >>   eval_loss = 0.5319785475730896
[INFO|run_glue.py:451] 2021-03-06 18:01:52,308 >>   eval_mem_cpu_alloc_delta = 74161
[INFO|run_glue.py:451] 2021-03-06 18:01:52,309 >>   eval_mem_cpu_peaked_delta = 70099
[INFO|run_glue.py:451] 2021-03-06 18:01:52,311 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 18:01:52,313 >>   eval_mem_gpu_peaked_delta = 34652160
[INFO|run_glue.py:451] 2021-03-06 18:01:52,314 >>   eval_runtime = 6.9561
[INFO|run_glue.py:451] 2021-03-06 18:01:52,315 >>   eval_samples_per_second = 126.221
[INFO|training_args.py:610] 2021-03-06 18:01:52,507 >> PyTorch: setting up devices
[INFO|training_args.py:534] 2021-03-06 18:01:52,511 >> The default value for the training argument `--

Saving data to: data/send_block_8/train.json
Saving data to: data/send_block_8/validation.json
Saving data to: data/send_block_8/test.json


[INFO|run_glue.py:237] 2021-03-06 18:01:52,539 >> load a local file for test: data/send_block_8/test.json


Downloading and preparing dataset json/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/json/default-0af92e7d7421e2fb/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-0af92e7d7421e2fb/0.0.0/dc7ee63ec8b554c48ecc5a8a6fbe27af8071408c244e4347cf9222d6206d83a2. Subsequent calls will reuse this data.


[INFO|configuration_utils.py:457] 2021-03-06 18:01:52,927 >> loading configuration file https://huggingface.co/bert-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a803e0468a8fe090683bdc453f4fac622804f49de86d7cecaee92365d4a0f829.0d87139f53a477d9f900f8a9020c367863079014bdaf2aa713f4b64cf1782655
[INFO|configuration_utils.py:493] 2021-03-06 18:01:52,929 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

[INFO|conf

HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))

[INFO|run_glue.py:365] 2021-03-06 18:01:57,282 >> Sample 1309 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'idx': 3567, 'input_ids': [101, 153, 10783, 3663, 20595, 2349, 18784, 1667, 20575, 13225, 117, 1150, 11526, 1103, 3785, 8011, 1114, 1109, 5334, 2381, 1135, 1732, 1105, 3100, 25911, 156, 5674, 2723, 2977, 4242, 136, 102, 1622, 5259, 25795, 1176, 107, 1109, 5334, 2381, 1135, 1732, 107, 113, 3130, 114, 117, 107, 3100, 25911, 156, 5674, 2723, 2977, 4242, 136, 107, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 




[INFO|trainer.py:484] 2021-03-06 18:01:57,644 >> The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:484] 2021-03-06 18:01:57,645 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: idx, sentence1, sentence2.
[INFO|trainer.py:934] 2021-03-06 18:01:57,905 >> ***** Running training *****
[INFO|trainer.py:935] 2021-03-06 18:01:57,906 >>   Num examples = 1990
[INFO|trainer.py:936] 2021-03-06 18:01:57,907 >>   Num Epochs = 3
[INFO|trainer.py:937] 2021-03-06 18:01:57,915 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:938] 2021-03-06 18:01:57,917 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:939] 2021-03-06 18:01:57,918 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:940] 2021-03-06 18:01:57,920

Step,Training Loss


[INFO|trainer.py:1117] 2021-03-06 18:04:06,588 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1535] 2021-03-06 18:04:06,815 >> Saving model checkpoint to checkpoint
[INFO|configuration_utils.py:312] 2021-03-06 18:04:06,819 >> Configuration saved in checkpoint/config.json
[INFO|modeling_utils.py:825] 2021-03-06 18:04:08,015 >> Model weights saved in checkpoint/pytorch_model.bin
[INFO|tokenization_utils_base.py:1910] 2021-03-06 18:04:08,020 >> tokenizer config file saved in checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:1916] 2021-03-06 18:04:08,022 >> Special tokens file saved in checkpoint/special_tokens_map.json
[INFO|run_glue.py:423] 2021-03-06 18:04:08,063 >> ***** Train results *****
[INFO|run_glue.py:425] 2021-03-06 18:04:08,064 >>   epoch = 3.0
[INFO|run_glue.py:425] 2021-03-06 18:04:08,065 >>   init_mem_cpu_alloc_delta = 53685
[INFO|run_glue.py:425] 2021-03-06 18:04:08,066 >>   init_mem_cpu_peaked_delt

[INFO|run_glue.py:449] 2021-03-06 18:04:19,919 >> ***** Eval results None *****
[INFO|run_glue.py:451] 2021-03-06 18:04:19,920 >>   epoch = 3.0
[INFO|run_glue.py:451] 2021-03-06 18:04:19,921 >>   eval_accuracy = 0.7652711272239685
[INFO|run_glue.py:451] 2021-03-06 18:04:19,928 >>   eval_loss = 0.5069981217384338
[INFO|run_glue.py:451] 2021-03-06 18:04:19,934 >>   eval_mem_cpu_alloc_delta = 81219
[INFO|run_glue.py:451] 2021-03-06 18:04:19,936 >>   eval_mem_cpu_peaked_delta = 116544
[INFO|run_glue.py:451] 2021-03-06 18:04:19,937 >>   eval_mem_gpu_alloc_delta = 0
[INFO|run_glue.py:451] 2021-03-06 18:04:19,938 >>   eval_mem_gpu_peaked_delta = 34663936
[INFO|run_glue.py:451] 2021-03-06 18:04:19,939 >>   eval_runtime = 11.5402
[INFO|run_glue.py:451] 2021-03-06 18:04:19,942 >>   eval_samples_per_second = 126.254


We're basically done now that we've gotten the losses on each block. If the model training function (`train_model.main()`) returned a mean squared error loss values instead of NLL (e.g., for regression), we'll need to convert the loss values to NLLs:

In [14]:
def mse2nll(nll, std_dev=1.):
    """
    Utility to convert expected mean squared error (MSE) to expected NLL.
    Here, MSE = (y' - y)^2, where y' is the predicted label and y is the true label.
    We treat all regression/MSE predictions as a mean with a fixed std_dev.
    We use std_dev=1 as a default, but other values may work better, e.g., if chosen on dev.
    """
    return (nll / (2. * (std_dev ** 2))) + math.log(std_dev * math.sqrt(2 * math.pi))

if mse:
    nlls = [mse2nll(mse) for mse in nlls]

Finally, we'll compute the per-sample codelengths (in bits) for different blocks (in order from earliest to latest):

In [15]:
codelengths = [nll / math.log(2) for nll in nlls]
print('Codelengths:', codelengths)

Codelengths: [1.0, 0.8554979437184423, 0.7470464580062841, 0.93560875236022, 0.8891608818937488, 0.8422681783008208, 0.8201923132534198, 0.7674828124430099, 0.7314436759720575]


To get MDL, let's compute the number of examples that were in each block:

In [16]:
block_sizes = [(block_start - block_end) for block_start, block_end in zip(block_start_idxs[1:], block_start_idxs[:-1])]
print('Block Sizes:', block_sizes)

Block Sizes: [64, 42, 70, 116, 193, 319, 529, 878, 1457]


Then, we can compute the codelength for sending each block and sum those values to get MDL:

In [17]:
mdl = sum(block_size * per_sample_codelength for block_size, per_sample_codelength in zip(block_sizes, codelengths))
print('MDL:', round(mdl), 'bits')

MDL: 2874 bits



And that's how you compute MDL! To conduct RDA, just modify the dataset inputs (where described above) and see how the MDL changes. Now you're ready to analyze your own datasets 🥳