Copyright (c) Microsoft Corporation.
Licensed under the MIT License.

# Abstractive Summarization using BertSumAbs on CNN/DailyMails Dataset

## Summary

This notebook demonstrates how to fine tune BERT for abstractive text summarization. Utility functions and classes in the NLP Best Practices repo are used to facilitate data preprocessing, model training, model scoring, result postprocessing, and model evaluation.

### Abstractive Summarization
Abstractive summarization is the task of taking an input text and summarizing its content in a shorter output text. In contrast to extractive summarization, abstractive summarization doesn't take sentences directly from the input text, instead, rephrases the input text.

### BertSumAbs

BertSumAbs refers to an BERT-based abstractive summarization algorithm  in [Text Summarization with Pretrained Encoders](https://arxiv.org/abs/1908.08345) with [published examples](https://github.com/nlpyang/PreSumm). It uses the pretrained BERT model as encoder and finetune both encoder and decoder on a specific labeled summarization dataset like [CNN/DM dataset](https://github.com/harvardnlp/sent-summary). 

The figure below shows the comparison of architecture of the original BERT model (left) and BERTSUM (right), which BertSumAbs is built upon. For BERTSUM, a input document is split into sentences, and [CLS] and [SEP] tokens are inserted before and after each sentence. This resulting sequence is followed by the summation of three kinds of embeddings for each token before feeding into the transformer layers. The positional embedding used in BertSumAbs enables input length of more than 512, which is the  maximum input length for BERT model. 

It should be noted that the architecture only shows the encoder part. For decoder, BertSumAbs also uses a transformer with multiple layers and random initialization. As pretrained weights are used in the encoder, there is a mismatch in encoder and decoder which may result in unstable finetuning. Therefore, in fine tuning, BertSumAbs uses seperate optimizers for encoder and decoder, each uses its own scheduling. In text generation, techniques like trigram blocking and beam search can be used to improve model accuracy.
<img src="https://nlpbp.blob.core.windows.net/images/BertForSummarization.PNG">


## Before you start

It's recommended to run this notebook on GPU machines as it's very computationally intensive. Set QUICK_RUN = True to run the notebook on a small subset of data and a smaller number of steps. If QUICK_RUN = False, the notebook takes about 5 hours to run on a VM with 4 16GB NVIDIA V100 GPUs. Finetuning costs around 1.5 hours and inferecing costs around 3.5 hour.  Better performance can be achieved by increasing the MAX_STEPS.

* **ROUGE Evalation**: To run rouge evaluation, please refer to the section of compute_rouge_perl in [summarization_evaluation.ipynb](./summarization_evaluation.ipynb) for setup.

* **Distributed Training**:
Please note that the jupyter notebook only allows to use pytorch [DataParallel](https://pytorch.org/docs/master/nn.html#dataparallel). Faster speed and larger batch size can be achieved with pytorch [DistributedDataParallel](https://pytorch.org/docs/master/notes/ddp.html)(DDP). Script [abstractive_summarization_bertsum_cnndm_distributed_train.py](./abstractive_summarization_bertsum_cnndm_distributed_train.py) shows an example of how to use DDP.

* **Mixed Precision Training**:
Please note that by default this notebook doesn't use mixed precision training. Faster speed and larger batch size can be achieved when you set FP16 to True. Refer to  https://nvidia.github.io/apex and https://github.com/nvidia/apex) for details to use mixed precision training. Check the GPU model on your machine to see if it allows mixed precision training. Please also note that mixed precision inferencing is also enabled in the prediciton utility function. When you use mixed precision training and/or inferencing, the model performance can be slightly worse than the full precision mode.

In [1]:
QUICK_RUN = False

In [2]:
import os
os.getcwd()

'/home/ge75zam2/german-bertabs/examples/text_summarization'

In [3]:
import os
import shutil
import sys
from tempfile import TemporaryDirectory
import torch

nlp_path = os.path.abspath("../../")
if nlp_path not in sys.path:
    sys.path.insert(0, nlp_path)

from utils_nlp.models.transformers.abstractive_summarization_bertsum import (
    BertSumAbs,
    BertSumAbsProcessor,
    validate
)

from utils_nlp.eval import compute_rouge_python

from utils_nlp.models.transformers.datasets import SummarizationDataset
import nltk
from nltk import tokenize

import pandas as pd
import pprint
import scrapbook as sb

In [4]:
import sys
sys.path

['/home/ge75zam2/german-bertabs',
 '/home/ge75zam2/miniconda3/envs/finetune/lib/python36.zip',
 '/home/ge75zam2/miniconda3/envs/finetune/lib/python3.6',
 '/home/ge75zam2/miniconda3/envs/finetune/lib/python3.6/lib-dynload',
 '',
 '/home/ge75zam2/miniconda3/envs/finetune/lib/python3.6/site-packages',
 '/home/ge75zam2/miniconda3/envs/finetune/lib/python3.6/site-packages/IPython/extensions',
 '/home/ge75zam2/.ipython']

## Data Preprocessing

The dataset we used for this notebook is CNN/DM dataset which contains the documents and accompanying questions from the news articles of CNN and Daily mail. The highlights in each article are used as summary. The dataset consits of ~289K training examples, ~11K valiation examples and ~11K test examples. The length of the news articles is 781 tokens on average and the summaries are of 3.75 sentences and 56 tokens on average.

The significant part of data preprocessing only involve splitting the input document into sentences.

In [5]:
# the data path used to save the downloaded data file
#DATA_PATH = TemporaryDirectory().name
# The number of lines at the head of data file used for preprocessing. -1 means all the lines.
TOP_N = 100
if not QUICK_RUN:
    TOP_N = -1

In [6]:
#train_dataset, test_dataset = CNNDMSummarizationDataset(
#    top_n=TOP_N, local_cache_path=DATA_PATH, prepare_extractive=False
#)

from utils_nlp.dataset.swiss import SwissSummarizationDataset
train_dataset, validation_dataset, test_dataset = SwissSummarizationDataset( top_n=TOP_N, validation=True)


In [7]:
len(train_dataset), len(validation_dataset), len(test_dataset)

(85500, 9500, 5000)

## Model Finetuning

In [8]:
# notebook parameters
# the cache path
CACHE_PATH = '/home/ge75zam2/finetuning'

# model parameters
MODEL_NAME = "bert-base-german-cased"
MAX_POS = 768
MAX_SOURCE_SEQ_LENGTH = 640
MAX_TARGET_SEQ_LENGTH = 140

# mixed precision setting. To enable mixed precision training, follow instructions in SETUP.md.
FP16 = False
if FP16:
    FP16_OPT_LEVEL = "O2"

# fine-tuning parameters
# batch size, unit is the number of tokens
BATCH_SIZE_PER_GPU = 1


# GPU used for training
NUM_GPUS = torch.cuda.device_count()

if NUM_GPUS > 0:
    BATCH_SIZE = NUM_GPUS * BATCH_SIZE_PER_GPU
else:
    BATCH_SIZE = 1


# Learning rate
LEARNING_RATE_BERT = 5e-4 / 2.0
LEARNING_RATE_DEC = 0.05 / 2.0


# How often the statistics reports show up in training, unit is step.
REPORT_EVERY = 100
SAVE_EVERY = 1000

# total number of steps for training
MAX_STEPS = 1000

if not QUICK_RUN:
    MAX_STEPS = 10e6

WARMUP_STEPS_BERT = 2000
WARMUP_STEPS_DEC = 1000

In [9]:
from utils_nlp.common.pytorch_utils import (
    compute_training_steps,
    get_amp,
    get_device,
    move_model_to_device,
    parallelize_model,
)

get_device(NUM_GPUS)

(device(type='cuda'), 8)

In [10]:
# processor which contains the colloate function to load the preprocessed data
processor = BertSumAbsProcessor(model_name=MODEL_NAME, cache_dir=CACHE_PATH, max_src_len=MAX_SOURCE_SEQ_LENGTH, max_tgt_len=MAX_TARGET_SEQ_LENGTH)
# summarizer
summarizer = BertSumAbs(
    processor, model_name=MODEL_NAME, cache_dir=CACHE_PATH, max_pos_length=MAX_POS
)

In [11]:
print(BATCH_SIZE_PER_GPU*NUM_GPUS)
print(MAX_STEPS)

8
10000000.0


In [12]:
def validation_function(summarizer):
    return validate(summarizer, validation_dataset, language='en')

In [None]:
summarizer.fit(
    train_dataset,
    num_gpus=NUM_GPUS,
    batch_size=BATCH_SIZE,
    max_steps=MAX_STEPS,
    learning_rate_bert=LEARNING_RATE_BERT,
    learning_rate_dec=LEARNING_RATE_DEC,
    warmup_steps_bert=WARMUP_STEPS_BERT,
    warmup_steps_dec=WARMUP_STEPS_DEC,
    save_every=SAVE_EVERY,
    report_every=REPORT_EVERY,
    fp16=FP16,
    # validation_function=validation_function
    # checkpoint="saved checkpoint path"
)

device is cuda


Iteration:   1%|          | 100/10688 [01:48<2:14:33,  1.31it/s]

timestamp: 21/06/2020 15:22:20, average loss: 8.350365, time duration: 108.646673,
                            number of examples in current reporting: 800, step 100
                            out of total 10000000


Iteration:   2%|▏         | 200/10688 [03:12<2:21:10,  1.24it/s]

timestamp: 21/06/2020 15:23:44, average loss: 6.040869, time duration: 83.839989,
                            number of examples in current reporting: 800, step 200
                            out of total 10000000


Iteration:   3%|▎         | 300/10688 [04:36<2:34:00,  1.12it/s]

timestamp: 21/06/2020 15:25:07, average loss: 5.800758, time duration: 83.529028,
                            number of examples in current reporting: 800, step 300
                            out of total 10000000


Iteration:   4%|▎         | 400/10688 [06:00<2:22:49,  1.20it/s]

timestamp: 21/06/2020 15:26:31, average loss: 5.494409, time duration: 84.384645,
                            number of examples in current reporting: 800, step 400
                            out of total 10000000


Iteration:   5%|▍         | 500/10688 [07:24<2:19:32,  1.22it/s]

timestamp: 21/06/2020 15:27:56, average loss: 5.205037, time duration: 84.107093,
                            number of examples in current reporting: 800, step 500
                            out of total 10000000


Iteration:   6%|▌         | 600/10688 [08:49<2:32:37,  1.10it/s]

timestamp: 21/06/2020 15:29:20, average loss: 4.912124, time duration: 84.738849,
                            number of examples in current reporting: 800, step 600
                            out of total 10000000


Iteration:   7%|▋         | 700/10688 [10:14<2:23:28,  1.16it/s]

timestamp: 21/06/2020 15:30:45, average loss: 4.823152, time duration: 85.129598,
                            number of examples in current reporting: 800, step 700
                            out of total 10000000


Iteration:   7%|▋         | 800/10688 [11:38<2:16:42,  1.21it/s]

timestamp: 21/06/2020 15:32:10, average loss: 4.607586, time duration: 84.343263,
                            number of examples in current reporting: 800, step 800
                            out of total 10000000


Iteration:   8%|▊         | 900/10688 [13:02<2:27:17,  1.11it/s]

timestamp: 21/06/2020 15:33:33, average loss: 4.511585, time duration: 83.723667,
                            number of examples in current reporting: 800, step 900
                            out of total 10000000


Iteration:   9%|▉         | 999/10688 [14:25<2:22:31,  1.13it/s]

timestamp: 21/06/2020 15:34:57, average loss: 4.465706, time duration: 83.787730,
                            number of examples in current reporting: 800, step 1000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  10%|█         | 1100/10688 [15:53<2:12:28,  1.21it/s]

timestamp: 21/06/2020 15:36:24, average loss: 4.370876, time duration: 86.895396,
                            number of examples in current reporting: 800, step 1100
                            out of total 10000000


Iteration:  11%|█         | 1200/10688 [17:17<2:23:09,  1.10it/s]

timestamp: 21/06/2020 15:37:49, average loss: 4.283258, time duration: 84.673354,
                            number of examples in current reporting: 800, step 1200
                            out of total 10000000


Iteration:  12%|█▏        | 1300/10688 [18:41<2:09:11,  1.21it/s]

timestamp: 21/06/2020 15:39:12, average loss: 4.292709, time duration: 83.561607,
                            number of examples in current reporting: 800, step 1300
                            out of total 10000000


Iteration:  13%|█▎        | 1400/10688 [20:05<2:05:54,  1.23it/s]

timestamp: 21/06/2020 15:40:37, average loss: 4.167070, time duration: 84.170270,
                            number of examples in current reporting: 800, step 1400
                            out of total 10000000


Iteration:  14%|█▍        | 1500/10688 [21:31<2:07:55,  1.20it/s]

timestamp: 21/06/2020 15:42:02, average loss: 4.142542, time duration: 85.637323,
                            number of examples in current reporting: 800, step 1500
                            out of total 10000000


Iteration:  15%|█▍        | 1600/10688 [22:57<2:04:44,  1.21it/s]

timestamp: 21/06/2020 15:43:29, average loss: 4.053082, time duration: 86.757324,
                            number of examples in current reporting: 800, step 1600
                            out of total 10000000


Iteration:  16%|█▌        | 1700/10688 [24:22<2:12:51,  1.13it/s]

timestamp: 21/06/2020 15:44:53, average loss: 4.021414, time duration: 84.024739,
                            number of examples in current reporting: 800, step 1700
                            out of total 10000000


Iteration:  17%|█▋        | 1800/10688 [25:45<2:09:24,  1.14it/s]

timestamp: 21/06/2020 15:46:16, average loss: 3.958422, time duration: 83.310111,
                            number of examples in current reporting: 800, step 1800
                            out of total 10000000


Iteration:  18%|█▊        | 1900/10688 [27:10<2:00:31,  1.22it/s]

timestamp: 21/06/2020 15:47:42, average loss: 3.976216, time duration: 85.287461,
                            number of examples in current reporting: 800, step 1900
                            out of total 10000000


Iteration:  19%|█▊        | 1999/10688 [28:35<2:05:49,  1.15it/s]

timestamp: 21/06/2020 15:49:07, average loss: 3.861925, time duration: 85.383578,
                            number of examples in current reporting: 800, step 2000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  20%|█▉        | 2100/10688 [30:03<2:01:16,  1.18it/s]

timestamp: 21/06/2020 15:50:34, average loss: 3.852054, time duration: 87.321747,
                            number of examples in current reporting: 800, step 2100
                            out of total 10000000


Iteration:  21%|██        | 2200/10688 [31:27<1:54:35,  1.23it/s]

timestamp: 21/06/2020 15:51:58, average loss: 3.888199, time duration: 83.719975,
                            number of examples in current reporting: 800, step 2200
                            out of total 10000000


Iteration:  22%|██▏       | 2300/10688 [32:50<1:55:53,  1.21it/s]

timestamp: 21/06/2020 15:53:22, average loss: 3.812717, time duration: 83.926361,
                            number of examples in current reporting: 800, step 2300
                            out of total 10000000


Iteration:  22%|██▏       | 2400/10688 [34:14<1:50:41,  1.25it/s]

timestamp: 21/06/2020 15:54:45, average loss: 3.747386, time duration: 83.399191,
                            number of examples in current reporting: 800, step 2400
                            out of total 10000000


Iteration:  23%|██▎       | 2500/10688 [35:40<2:06:06,  1.08it/s]

timestamp: 21/06/2020 15:56:12, average loss: 3.745168, time duration: 86.573018,
                            number of examples in current reporting: 800, step 2500
                            out of total 10000000


Iteration:  24%|██▍       | 2600/10688 [37:05<1:49:05,  1.24it/s]

timestamp: 21/06/2020 15:57:37, average loss: 3.722823, time duration: 84.957280,
                            number of examples in current reporting: 800, step 2600
                            out of total 10000000


Iteration:  25%|██▌       | 2700/10688 [38:30<1:52:10,  1.19it/s]

timestamp: 21/06/2020 15:59:02, average loss: 3.661890, time duration: 84.908415,
                            number of examples in current reporting: 800, step 2700
                            out of total 10000000


Iteration:  26%|██▌       | 2800/10688 [39:55<1:46:16,  1.24it/s]

timestamp: 21/06/2020 16:00:26, average loss: 3.659604, time duration: 84.379911,
                            number of examples in current reporting: 800, step 2800
                            out of total 10000000


Iteration:  27%|██▋       | 2900/10688 [41:19<1:49:08,  1.19it/s]

timestamp: 21/06/2020 16:01:50, average loss: 3.695788, time duration: 84.259224,
                            number of examples in current reporting: 800, step 2900
                            out of total 10000000


Iteration:  28%|██▊       | 2999/10688 [42:42<1:45:56,  1.21it/s]

timestamp: 21/06/2020 16:03:14, average loss: 3.642765, time duration: 83.649727,
                            number of examples in current reporting: 800, step 3000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  29%|██▉       | 3100/10688 [44:09<1:39:43,  1.27it/s]

timestamp: 21/06/2020 16:04:40, average loss: 3.614862, time duration: 86.188142,
                            number of examples in current reporting: 800, step 3100
                            out of total 10000000


Iteration:  30%|██▉       | 3200/10688 [45:33<1:43:01,  1.21it/s]

timestamp: 21/06/2020 16:06:05, average loss: 3.642029, time duration: 84.259916,
                            number of examples in current reporting: 800, step 3200
                            out of total 10000000


Iteration:  31%|███       | 3300/10688 [46:56<1:46:53,  1.15it/s]

timestamp: 21/06/2020 16:07:28, average loss: 3.600018, time duration: 83.422373,
                            number of examples in current reporting: 800, step 3300
                            out of total 10000000


Iteration:  32%|███▏      | 3400/10688 [48:20<1:41:25,  1.20it/s]

timestamp: 21/06/2020 16:08:52, average loss: 3.594921, time duration: 83.617244,
                            number of examples in current reporting: 800, step 3400
                            out of total 10000000


Iteration:  33%|███▎      | 3500/10688 [49:42<1:35:08,  1.26it/s]

timestamp: 21/06/2020 16:10:14, average loss: 3.460396, time duration: 82.180457,
                            number of examples in current reporting: 800, step 3500
                            out of total 10000000


Iteration:  34%|███▎      | 3600/10688 [51:06<1:39:02,  1.19it/s]

timestamp: 21/06/2020 16:11:37, average loss: 3.509037, time duration: 83.549864,
                            number of examples in current reporting: 800, step 3600
                            out of total 10000000


Iteration:  35%|███▍      | 3700/10688 [52:31<1:35:05,  1.22it/s]

timestamp: 21/06/2020 16:13:02, average loss: 3.518779, time duration: 84.847338,
                            number of examples in current reporting: 800, step 3700
                            out of total 10000000


Iteration:  36%|███▌      | 3800/10688 [53:56<1:44:20,  1.10it/s]

timestamp: 21/06/2020 16:14:28, average loss: 3.472720, time duration: 85.603042,
                            number of examples in current reporting: 800, step 3800
                            out of total 10000000


Iteration:  36%|███▋      | 3900/10688 [55:21<1:33:45,  1.21it/s]

timestamp: 21/06/2020 16:15:53, average loss: 3.473419, time duration: 85.151808,
                            number of examples in current reporting: 800, step 3900
                            out of total 10000000


Iteration:  37%|███▋      | 3999/10688 [56:45<1:34:14,  1.18it/s]

timestamp: 21/06/2020 16:17:18, average loss: 3.459688, time duration: 84.772813,
                            number of examples in current reporting: 800, step 4000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  38%|███▊      | 4100/10688 [58:15<1:37:55,  1.12it/s]

timestamp: 21/06/2020 16:18:46, average loss: 3.490098, time duration: 88.535768,
                            number of examples in current reporting: 800, step 4100
                            out of total 10000000


Iteration:  39%|███▉      | 4200/10688 [59:38<1:28:03,  1.23it/s]

timestamp: 21/06/2020 16:20:10, average loss: 3.395469, time duration: 83.594274,
                            number of examples in current reporting: 800, step 4200
                            out of total 10000000


Iteration:  40%|████      | 4300/10688 [1:01:03<1:30:43,  1.17it/s]

timestamp: 21/06/2020 16:21:35, average loss: 3.401867, time duration: 84.865977,
                            number of examples in current reporting: 800, step 4300
                            out of total 10000000


Iteration:  41%|████      | 4400/10688 [1:02:28<1:31:20,  1.15it/s]

timestamp: 21/06/2020 16:22:59, average loss: 3.424215, time duration: 84.690752,
                            number of examples in current reporting: 800, step 4400
                            out of total 10000000


Iteration:  42%|████▏     | 4500/10688 [1:03:53<1:25:51,  1.20it/s]

timestamp: 21/06/2020 16:24:24, average loss: 3.388832, time duration: 84.790908,
                            number of examples in current reporting: 800, step 4500
                            out of total 10000000


Iteration:  43%|████▎     | 4600/10688 [1:05:17<1:21:42,  1.24it/s]

timestamp: 21/06/2020 16:25:49, average loss: 3.365556, time duration: 84.571383,
                            number of examples in current reporting: 800, step 4600
                            out of total 10000000


Iteration:  44%|████▍     | 4700/10688 [1:06:43<1:22:48,  1.21it/s]

timestamp: 21/06/2020 16:27:14, average loss: 3.359353, time duration: 85.324127,
                            number of examples in current reporting: 800, step 4700
                            out of total 10000000


Iteration:  45%|████▍     | 4800/10688 [1:08:08<1:23:52,  1.17it/s]

timestamp: 21/06/2020 16:28:40, average loss: 3.373320, time duration: 85.813130,
                            number of examples in current reporting: 800, step 4800
                            out of total 10000000


Iteration:  46%|████▌     | 4900/10688 [1:09:33<1:27:30,  1.10it/s]

timestamp: 21/06/2020 16:30:05, average loss: 3.375648, time duration: 85.016263,
                            number of examples in current reporting: 800, step 4900
                            out of total 10000000


Iteration:  47%|████▋     | 4999/10688 [1:10:58<1:21:41,  1.16it/s]

timestamp: 21/06/2020 16:31:30, average loss: 3.277544, time duration: 85.200375,
                            number of examples in current reporting: 800, step 5000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  48%|████▊     | 5100/10688 [1:12:27<1:22:28,  1.13it/s]

timestamp: 21/06/2020 16:32:58, average loss: 3.348529, time duration: 88.427312,
                            number of examples in current reporting: 800, step 5100
                            out of total 10000000


Iteration:  49%|████▊     | 5200/10688 [1:13:51<1:16:03,  1.20it/s]

timestamp: 21/06/2020 16:34:23, average loss: 3.335098, time duration: 84.449167,
                            number of examples in current reporting: 800, step 5200
                            out of total 10000000


Iteration:  50%|████▉     | 5300/10688 [1:15:15<1:09:27,  1.29it/s]

timestamp: 21/06/2020 16:35:46, average loss: 3.264541, time duration: 83.549630,
                            number of examples in current reporting: 800, step 5300
                            out of total 10000000


Iteration:  51%|█████     | 5400/10688 [1:16:41<1:12:50,  1.21it/s]

timestamp: 21/06/2020 16:37:12, average loss: 3.252502, time duration: 85.711949,
                            number of examples in current reporting: 800, step 5400
                            out of total 10000000


Iteration:  51%|█████▏    | 5500/10688 [1:18:07<1:17:21,  1.12it/s]

timestamp: 21/06/2020 16:38:38, average loss: 3.273807, time duration: 86.071720,
                            number of examples in current reporting: 800, step 5500
                            out of total 10000000


Iteration:  52%|█████▏    | 5600/10688 [1:19:30<1:11:23,  1.19it/s]

timestamp: 21/06/2020 16:40:01, average loss: 3.244501, time duration: 83.117159,
                            number of examples in current reporting: 800, step 5600
                            out of total 10000000


Iteration:  53%|█████▎    | 5700/10688 [1:20:54<1:07:10,  1.24it/s]

timestamp: 21/06/2020 16:41:25, average loss: 3.207497, time duration: 83.703153,
                            number of examples in current reporting: 800, step 5700
                            out of total 10000000


Iteration:  54%|█████▍    | 5800/10688 [1:22:17<1:05:41,  1.24it/s]

timestamp: 21/06/2020 16:42:49, average loss: 3.233177, time duration: 83.671665,
                            number of examples in current reporting: 800, step 5800
                            out of total 10000000


Iteration:  55%|█████▌    | 5900/10688 [1:23:41<1:12:38,  1.10it/s]

timestamp: 21/06/2020 16:44:13, average loss: 3.248525, time duration: 84.166529,
                            number of examples in current reporting: 800, step 5900
                            out of total 10000000


Iteration:  56%|█████▌    | 5999/10688 [1:25:04<1:07:05,  1.16it/s]

timestamp: 21/06/2020 16:45:37, average loss: 3.196170, time duration: 83.785335,
                            number of examples in current reporting: 800, step 6000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  57%|█████▋    | 6100/10688 [1:26:34<1:05:52,  1.16it/s]

timestamp: 21/06/2020 16:47:05, average loss: 3.222554, time duration: 88.511788,
                            number of examples in current reporting: 800, step 6100
                            out of total 10000000


Iteration:  58%|█████▊    | 6200/10688 [1:27:58<1:06:01,  1.13it/s]

timestamp: 21/06/2020 16:48:30, average loss: 3.166953, time duration: 84.641227,
                            number of examples in current reporting: 800, step 6200
                            out of total 10000000


Iteration:  59%|█████▉    | 6300/10688 [1:29:23<1:03:46,  1.15it/s]

timestamp: 21/06/2020 16:49:54, average loss: 3.198289, time duration: 84.180858,
                            number of examples in current reporting: 800, step 6300
                            out of total 10000000


Iteration:  60%|█████▉    | 6400/10688 [1:30:47<57:40,  1.24it/s]  

timestamp: 21/06/2020 16:51:18, average loss: 3.161820, time duration: 83.994955,
                            number of examples in current reporting: 800, step 6400
                            out of total 10000000


Iteration:  61%|██████    | 6500/10688 [1:32:11<59:49,  1.17it/s]  

timestamp: 21/06/2020 16:52:42, average loss: 3.183579, time duration: 84.398123,
                            number of examples in current reporting: 800, step 6500
                            out of total 10000000


Iteration:  62%|██████▏   | 6600/10688 [1:33:37<59:23,  1.15it/s]  

timestamp: 21/06/2020 16:54:08, average loss: 3.169305, time duration: 85.590400,
                            number of examples in current reporting: 800, step 6600
                            out of total 10000000


Iteration:  63%|██████▎   | 6700/10688 [1:35:02<55:19,  1.20it/s]  

timestamp: 21/06/2020 16:55:33, average loss: 3.167968, time duration: 85.357746,
                            number of examples in current reporting: 800, step 6700
                            out of total 10000000


Iteration:  64%|██████▎   | 6800/10688 [1:36:26<55:53,  1.16it/s]  

timestamp: 21/06/2020 16:56:58, average loss: 3.149842, time duration: 84.589440,
                            number of examples in current reporting: 800, step 6800
                            out of total 10000000


Iteration:  65%|██████▍   | 6900/10688 [1:37:51<51:40,  1.22it/s]  

timestamp: 21/06/2020 16:58:22, average loss: 3.137630, time duration: 84.483419,
                            number of examples in current reporting: 800, step 6900
                            out of total 10000000


Iteration:  65%|██████▌   | 6999/10688 [1:39:15<50:50,  1.21it/s]

timestamp: 21/06/2020 16:59:48, average loss: 3.216316, time duration: 85.117683,
                            number of examples in current reporting: 800, step 7000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  66%|██████▋   | 7100/10688 [1:40:43<49:46,  1.20it/s]  

timestamp: 21/06/2020 17:01:15, average loss: 3.187264, time duration: 87.337562,
                            number of examples in current reporting: 800, step 7100
                            out of total 10000000


Iteration:  67%|██████▋   | 7200/10688 [1:42:08<49:40,  1.17it/s]

timestamp: 21/06/2020 17:02:39, average loss: 3.118619, time duration: 84.208252,
                            number of examples in current reporting: 800, step 7200
                            out of total 10000000


Iteration:  68%|██████▊   | 7300/10688 [1:43:32<51:17,  1.10it/s]

timestamp: 21/06/2020 17:04:03, average loss: 3.105317, time duration: 83.968469,
                            number of examples in current reporting: 800, step 7300
                            out of total 10000000


Iteration:  69%|██████▉   | 7400/10688 [1:44:57<47:12,  1.16it/s]

timestamp: 21/06/2020 17:05:29, average loss: 3.179133, time duration: 85.421437,
                            number of examples in current reporting: 800, step 7400
                            out of total 10000000


Iteration:  70%|███████   | 7500/10688 [1:46:21<45:46,  1.16it/s]

timestamp: 21/06/2020 17:06:52, average loss: 3.080246, time duration: 83.973085,
                            number of examples in current reporting: 800, step 7500
                            out of total 10000000


Iteration:  71%|███████   | 7600/10688 [1:47:44<44:09,  1.17it/s]

timestamp: 21/06/2020 17:08:16, average loss: 3.118516, time duration: 83.429632,
                            number of examples in current reporting: 800, step 7600
                            out of total 10000000


Iteration:  72%|███████▏  | 7700/10688 [1:49:09<39:48,  1.25it/s]

timestamp: 21/06/2020 17:09:41, average loss: 3.069415, time duration: 84.694021,
                            number of examples in current reporting: 800, step 7700
                            out of total 10000000


Iteration:  73%|███████▎  | 7800/10688 [1:50:36<38:25,  1.25it/s]

timestamp: 21/06/2020 17:11:07, average loss: 3.158114, time duration: 86.600054,
                            number of examples in current reporting: 800, step 7800
                            out of total 10000000


Iteration:  74%|███████▍  | 7900/10688 [1:51:59<40:26,  1.15it/s]

timestamp: 21/06/2020 17:12:31, average loss: 3.034283, time duration: 83.773458,
                            number of examples in current reporting: 800, step 7900
                            out of total 10000000


Iteration:  75%|███████▍  | 7999/10688 [1:53:24<37:55,  1.18it/s]

timestamp: 21/06/2020 17:13:56, average loss: 3.061304, time duration: 85.085081,
                            number of examples in current reporting: 800, step 8000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  76%|███████▌  | 8100/10688 [1:54:52<34:55,  1.24it/s]  

timestamp: 21/06/2020 17:15:24, average loss: 3.107670, time duration: 87.769370,
                            number of examples in current reporting: 800, step 8100
                            out of total 10000000


Iteration:  77%|███████▋  | 8200/10688 [1:56:17<37:11,  1.12it/s]

timestamp: 21/06/2020 17:16:49, average loss: 3.081606, time duration: 84.887732,
                            number of examples in current reporting: 800, step 8200
                            out of total 10000000


Iteration:  78%|███████▊  | 8300/10688 [1:57:42<33:11,  1.20it/s]

timestamp: 21/06/2020 17:18:13, average loss: 3.067569, time duration: 84.374873,
                            number of examples in current reporting: 800, step 8300
                            out of total 10000000


Iteration:  79%|███████▊  | 8400/10688 [1:59:08<32:10,  1.19it/s]

timestamp: 21/06/2020 17:19:39, average loss: 3.158092, time duration: 86.103435,
                            number of examples in current reporting: 800, step 8400
                            out of total 10000000


Iteration:  80%|███████▉  | 8500/10688 [2:00:33<28:38,  1.27it/s]

timestamp: 21/06/2020 17:21:04, average loss: 3.084071, time duration: 84.910197,
                            number of examples in current reporting: 800, step 8500
                            out of total 10000000


Iteration:  80%|████████  | 8600/10688 [2:01:57<30:38,  1.14it/s]

timestamp: 21/06/2020 17:22:28, average loss: 3.020362, time duration: 84.051013,
                            number of examples in current reporting: 800, step 8600
                            out of total 10000000


Iteration:  81%|████████▏ | 8700/10688 [2:03:21<26:50,  1.23it/s]

timestamp: 21/06/2020 17:23:52, average loss: 3.073369, time duration: 83.957370,
                            number of examples in current reporting: 800, step 8700
                            out of total 10000000


Iteration:  82%|████████▏ | 8800/10688 [2:04:44<24:37,  1.28it/s]

timestamp: 21/06/2020 17:25:16, average loss: 3.046112, time duration: 83.802972,
                            number of examples in current reporting: 800, step 8800
                            out of total 10000000


Iteration:  83%|████████▎ | 8900/10688 [2:06:09<23:39,  1.26it/s]

timestamp: 21/06/2020 17:26:40, average loss: 3.045682, time duration: 84.392816,
                            number of examples in current reporting: 800, step 8900
                            out of total 10000000


Iteration:  84%|████████▍ | 8999/10688 [2:07:33<24:23,  1.15it/s]

timestamp: 21/06/2020 17:28:05, average loss: 3.096648, time duration: 85.127188,
                            number of examples in current reporting: 800, step 9000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  85%|████████▌ | 9100/10688 [2:09:01<21:45,  1.22it/s]

timestamp: 21/06/2020 17:29:33, average loss: 3.039398, time duration: 87.078650,
                            number of examples in current reporting: 800, step 9100
                            out of total 10000000


Iteration:  86%|████████▌ | 9200/10688 [2:10:25<21:51,  1.13it/s]

timestamp: 21/06/2020 17:30:57, average loss: 2.977601, time duration: 84.336556,
                            number of examples in current reporting: 800, step 9200
                            out of total 10000000


Iteration:  87%|████████▋ | 9300/10688 [2:11:49<19:41,  1.18it/s]

timestamp: 21/06/2020 17:32:21, average loss: 3.034763, time duration: 84.120787,
                            number of examples in current reporting: 800, step 9300
                            out of total 10000000


Iteration:  88%|████████▊ | 9400/10688 [2:13:15<17:18,  1.24it/s]

timestamp: 21/06/2020 17:33:47, average loss: 3.002594, time duration: 85.732281,
                            number of examples in current reporting: 800, step 9400
                            out of total 10000000


Iteration:  89%|████████▉ | 9500/10688 [2:14:41<16:44,  1.18it/s]

timestamp: 21/06/2020 17:35:12, average loss: 3.031955, time duration: 85.598836,
                            number of examples in current reporting: 800, step 9500
                            out of total 10000000


Iteration:  90%|████████▉ | 9600/10688 [2:16:07<15:22,  1.18it/s]

timestamp: 21/06/2020 17:36:38, average loss: 2.966390, time duration: 85.799335,
                            number of examples in current reporting: 800, step 9600
                            out of total 10000000


Iteration:  91%|█████████ | 9700/10688 [2:17:32<14:37,  1.13it/s]

timestamp: 21/06/2020 17:38:03, average loss: 3.010634, time duration: 85.278526,
                            number of examples in current reporting: 800, step 9700
                            out of total 10000000


Iteration:  92%|█████████▏| 9800/10688 [2:18:59<12:41,  1.17it/s]

timestamp: 21/06/2020 17:39:30, average loss: 2.965256, time duration: 87.033680,
                            number of examples in current reporting: 800, step 9800
                            out of total 10000000


Iteration:  93%|█████████▎| 9900/10688 [2:20:22<11:20,  1.16it/s]

timestamp: 21/06/2020 17:40:54, average loss: 2.926347, time duration: 83.390107,
                            number of examples in current reporting: 800, step 9900
                            out of total 10000000


Iteration:  94%|█████████▎| 9999/10688 [2:21:45<09:16,  1.24it/s]

timestamp: 21/06/2020 17:42:18, average loss: 2.971803, time duration: 83.991845,
                            number of examples in current reporting: 800, step 10000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  94%|█████████▍| 10100/10688 [2:23:16<08:00,  1.22it/s]

timestamp: 21/06/2020 17:43:47, average loss: 2.978202, time duration: 89.341929,
                            number of examples in current reporting: 800, step 10100
                            out of total 10000000


Iteration:  95%|█████████▌| 10200/10688 [2:24:41<06:42,  1.21it/s]

timestamp: 21/06/2020 17:45:12, average loss: 2.995753, time duration: 84.954809,
                            number of examples in current reporting: 800, step 10200
                            out of total 10000000


Iteration:  96%|█████████▋| 10300/10688 [2:26:05<05:31,  1.17it/s]

timestamp: 21/06/2020 17:46:37, average loss: 2.902990, time duration: 84.582299,
                            number of examples in current reporting: 800, step 10300
                            out of total 10000000


Iteration:  97%|█████████▋| 10400/10688 [2:27:30<03:52,  1.24it/s]

timestamp: 21/06/2020 17:48:01, average loss: 2.972499, time duration: 84.788042,
                            number of examples in current reporting: 800, step 10400
                            out of total 10000000


Iteration:  98%|█████████▊| 10500/10688 [2:28:55<02:41,  1.17it/s]

timestamp: 21/06/2020 17:49:26, average loss: 2.934527, time duration: 84.538169,
                            number of examples in current reporting: 800, step 10500
                            out of total 10000000


Iteration:  99%|█████████▉| 10600/10688 [2:30:20<01:18,  1.12it/s]

timestamp: 21/06/2020 17:50:52, average loss: 3.006098, time duration: 85.503175,
                            number of examples in current reporting: 800, step 10600
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:34<00:00,  1.18it/s]
Iteration:   0%|          | 12/10688 [00:10<2:32:37,  1.17it/s]

timestamp: 21/06/2020 17:52:16, average loss: 2.920285, time duration: 84.686501,
                            number of examples in current reporting: 796, step 10700
                            out of total 10000000


Iteration:   1%|          | 112/10688 [01:35<2:32:13,  1.16it/s]

timestamp: 21/06/2020 17:53:41, average loss: 2.846944, time duration: 84.575039,
                            number of examples in current reporting: 800, step 10800
                            out of total 10000000


Iteration:   2%|▏         | 212/10688 [02:59<2:20:42,  1.24it/s]

timestamp: 21/06/2020 17:55:05, average loss: 2.861294, time duration: 84.520090,
                            number of examples in current reporting: 800, step 10900
                            out of total 10000000


Iteration:   3%|▎         | 311/10688 [04:23<2:18:44,  1.25it/s]

timestamp: 21/06/2020 17:56:31, average loss: 2.834417, time duration: 85.340320,
                            number of examples in current reporting: 800, step 11000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:   4%|▍         | 412/10688 [05:53<2:29:26,  1.15it/s]

timestamp: 21/06/2020 17:57:59, average loss: 2.918698, time duration: 88.515992,
                            number of examples in current reporting: 800, step 11100
                            out of total 10000000


Iteration:   5%|▍         | 512/10688 [07:19<2:16:21,  1.24it/s]

timestamp: 21/06/2020 17:59:25, average loss: 2.853466, time duration: 85.721538,
                            number of examples in current reporting: 800, step 11200
                            out of total 10000000


Iteration:   6%|▌         | 612/10688 [08:44<2:33:09,  1.10it/s]

timestamp: 21/06/2020 18:00:51, average loss: 2.809703, time duration: 85.738022,
                            number of examples in current reporting: 800, step 11300
                            out of total 10000000


Iteration:   7%|▋         | 712/10688 [10:09<2:11:46,  1.26it/s]

timestamp: 21/06/2020 18:02:15, average loss: 2.891629, time duration: 84.536797,
                            number of examples in current reporting: 800, step 11400
                            out of total 10000000


Iteration:   8%|▊         | 812/10688 [11:34<2:21:52,  1.16it/s]

timestamp: 21/06/2020 18:03:40, average loss: 2.871207, time duration: 85.136546,
                            number of examples in current reporting: 800, step 11500
                            out of total 10000000


Iteration:   9%|▊         | 912/10688 [12:59<2:09:44,  1.26it/s]

timestamp: 21/06/2020 18:05:05, average loss: 2.771579, time duration: 84.683186,
                            number of examples in current reporting: 800, step 11600
                            out of total 10000000


Iteration:   9%|▉         | 1012/10688 [14:24<2:19:14,  1.16it/s]

timestamp: 21/06/2020 18:06:30, average loss: 2.863332, time duration: 85.281762,
                            number of examples in current reporting: 800, step 11700
                            out of total 10000000


Iteration:  10%|█         | 1112/10688 [15:49<2:19:07,  1.15it/s]

timestamp: 21/06/2020 18:07:55, average loss: 2.846860, time duration: 84.858171,
                            number of examples in current reporting: 800, step 11800
                            out of total 10000000


Iteration:  11%|█▏        | 1212/10688 [17:14<2:15:54,  1.16it/s]

timestamp: 21/06/2020 18:09:20, average loss: 2.809895, time duration: 84.718072,
                            number of examples in current reporting: 800, step 11900
                            out of total 10000000


Iteration:  12%|█▏        | 1311/10688 [18:36<2:12:59,  1.18it/s]

timestamp: 21/06/2020 18:10:43, average loss: 2.790324, time duration: 83.654662,
                            number of examples in current reporting: 800, step 12000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  13%|█▎        | 1412/10688 [20:05<2:04:46,  1.24it/s]

timestamp: 21/06/2020 18:12:11, average loss: 2.880523, time duration: 87.819880,
                            number of examples in current reporting: 800, step 12100
                            out of total 10000000


Iteration:  14%|█▍        | 1512/10688 [21:31<2:14:21,  1.14it/s]

timestamp: 21/06/2020 18:13:37, average loss: 2.836458, time duration: 85.697299,
                            number of examples in current reporting: 800, step 12200
                            out of total 10000000


Iteration:  15%|█▌        | 1612/10688 [22:59<2:09:30,  1.17it/s]

timestamp: 21/06/2020 18:15:05, average loss: 2.836545, time duration: 87.686787,
                            number of examples in current reporting: 800, step 12300
                            out of total 10000000


Iteration:  16%|█▌        | 1712/10688 [24:24<2:05:50,  1.19it/s]

timestamp: 21/06/2020 18:16:30, average loss: 2.765739, time duration: 85.221069,
                            number of examples in current reporting: 800, step 12400
                            out of total 10000000


Iteration:  17%|█▋        | 1812/10688 [25:49<1:57:37,  1.26it/s]

timestamp: 21/06/2020 18:17:55, average loss: 2.853320, time duration: 84.882902,
                            number of examples in current reporting: 800, step 12500
                            out of total 10000000


Iteration:  18%|█▊        | 1912/10688 [27:14<2:01:25,  1.20it/s]

timestamp: 21/06/2020 18:19:20, average loss: 2.807057, time duration: 85.201152,
                            number of examples in current reporting: 800, step 12600
                            out of total 10000000


Iteration:  19%|█▉        | 2012/10688 [28:37<2:07:46,  1.13it/s]

timestamp: 21/06/2020 18:20:44, average loss: 2.796912, time duration: 83.616560,
                            number of examples in current reporting: 800, step 12700
                            out of total 10000000


Iteration:  20%|█▉        | 2112/10688 [30:02<2:08:08,  1.12it/s]

timestamp: 21/06/2020 18:22:08, average loss: 2.796693, time duration: 84.266901,
                            number of examples in current reporting: 800, step 12800
                            out of total 10000000


Iteration:  21%|██        | 2212/10688 [31:26<1:54:36,  1.23it/s]

timestamp: 21/06/2020 18:23:32, average loss: 2.772001, time duration: 83.955536,
                            number of examples in current reporting: 800, step 12900
                            out of total 10000000


Iteration:  22%|██▏       | 2311/10688 [32:49<1:53:45,  1.23it/s]

timestamp: 21/06/2020 18:24:56, average loss: 2.756612, time duration: 84.403414,
                            number of examples in current reporting: 800, step 13000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  23%|██▎       | 2412/10688 [34:18<1:56:44,  1.18it/s]

timestamp: 21/06/2020 18:26:24, average loss: 2.821820, time duration: 88.185304,
                            number of examples in current reporting: 800, step 13100
                            out of total 10000000


Iteration:  24%|██▎       | 2512/10688 [35:43<1:58:04,  1.15it/s]

timestamp: 21/06/2020 18:27:49, average loss: 2.757024, time duration: 84.716439,
                            number of examples in current reporting: 800, step 13200
                            out of total 10000000


Iteration:  24%|██▍       | 2612/10688 [37:08<1:55:21,  1.17it/s]

timestamp: 21/06/2020 18:29:15, average loss: 2.791712, time duration: 85.399869,
                            number of examples in current reporting: 800, step 13300
                            out of total 10000000


Iteration:  25%|██▌       | 2712/10688 [38:32<1:56:22,  1.14it/s]

timestamp: 21/06/2020 18:30:38, average loss: 2.800629, time duration: 83.484075,
                            number of examples in current reporting: 800, step 13400
                            out of total 10000000


Iteration:  26%|██▋       | 2812/10688 [39:56<1:44:19,  1.26it/s]

timestamp: 21/06/2020 18:32:02, average loss: 2.758840, time duration: 84.366177,
                            number of examples in current reporting: 800, step 13500
                            out of total 10000000


Iteration:  27%|██▋       | 2912/10688 [41:20<1:44:02,  1.25it/s]

timestamp: 21/06/2020 18:33:26, average loss: 2.794317, time duration: 83.748974,
                            number of examples in current reporting: 800, step 13600
                            out of total 10000000


Iteration:  28%|██▊       | 3012/10688 [42:44<1:46:18,  1.20it/s]

timestamp: 21/06/2020 18:34:51, average loss: 2.858212, time duration: 84.476852,
                            number of examples in current reporting: 800, step 13700
                            out of total 10000000


Iteration:  29%|██▉       | 3112/10688 [44:09<1:57:02,  1.08it/s]

timestamp: 21/06/2020 18:36:15, average loss: 2.821808, time duration: 84.809187,
                            number of examples in current reporting: 800, step 13800
                            out of total 10000000


Iteration:  30%|███       | 3212/10688 [45:35<1:44:59,  1.19it/s]

timestamp: 21/06/2020 18:37:41, average loss: 2.757809, time duration: 85.268684,
                            number of examples in current reporting: 800, step 13900
                            out of total 10000000


Iteration:  31%|███       | 3311/10688 [46:58<1:46:01,  1.16it/s]

timestamp: 21/06/2020 18:39:06, average loss: 2.733621, time duration: 84.842707,
                            number of examples in current reporting: 800, step 14000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  32%|███▏      | 3412/10688 [48:28<1:46:14,  1.14it/s]

timestamp: 21/06/2020 18:40:34, average loss: 2.793266, time duration: 88.666754,
                            number of examples in current reporting: 800, step 14100
                            out of total 10000000


Iteration:  33%|███▎      | 3512/10688 [49:52<1:39:55,  1.20it/s]

timestamp: 21/06/2020 18:41:58, average loss: 2.700968, time duration: 84.192528,
                            number of examples in current reporting: 800, step 14200
                            out of total 10000000


Iteration:  34%|███▍      | 3612/10688 [51:16<1:34:48,  1.24it/s]

timestamp: 21/06/2020 18:43:23, average loss: 2.750604, time duration: 84.219119,
                            number of examples in current reporting: 800, step 14300
                            out of total 10000000


Iteration:  35%|███▍      | 3712/10688 [52:40<1:33:48,  1.24it/s]

timestamp: 21/06/2020 18:44:46, average loss: 2.746550, time duration: 83.736440,
                            number of examples in current reporting: 800, step 14400
                            out of total 10000000


Iteration:  36%|███▌      | 3812/10688 [54:05<1:34:26,  1.21it/s]

timestamp: 21/06/2020 18:46:11, average loss: 2.725342, time duration: 84.383125,
                            number of examples in current reporting: 800, step 14500
                            out of total 10000000


Iteration:  37%|███▋      | 3912/10688 [55:30<1:40:14,  1.13it/s]

timestamp: 21/06/2020 18:47:36, average loss: 2.813879, time duration: 85.321994,
                            number of examples in current reporting: 800, step 14600
                            out of total 10000000


Iteration:  38%|███▊      | 4012/10688 [56:54<1:27:35,  1.27it/s]

timestamp: 21/06/2020 18:49:00, average loss: 2.736946, time duration: 84.268734,
                            number of examples in current reporting: 800, step 14700
                            out of total 10000000


Iteration:  38%|███▊      | 4112/10688 [58:19<1:37:05,  1.13it/s]

timestamp: 21/06/2020 18:50:25, average loss: 2.757743, time duration: 84.706205,
                            number of examples in current reporting: 800, step 14800
                            out of total 10000000


Iteration:  39%|███▉      | 4212/10688 [59:43<1:28:36,  1.22it/s]

timestamp: 21/06/2020 18:51:50, average loss: 2.727569, time duration: 84.493729,
                            number of examples in current reporting: 800, step 14900
                            out of total 10000000


Iteration:  40%|████      | 4311/10688 [1:01:05<1:30:05,  1.18it/s]

timestamp: 21/06/2020 18:53:12, average loss: 2.673245, time duration: 82.516978,
                            number of examples in current reporting: 800, step 15000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  41%|████▏     | 4412/10688 [1:02:34<1:29:52,  1.16it/s]

timestamp: 21/06/2020 18:54:40, average loss: 2.729916, time duration: 87.664874,
                            number of examples in current reporting: 800, step 15100
                            out of total 10000000


Iteration:  42%|████▏     | 4512/10688 [1:03:57<1:23:46,  1.23it/s]

timestamp: 21/06/2020 18:56:04, average loss: 2.706849, time duration: 83.966745,
                            number of examples in current reporting: 800, step 15200
                            out of total 10000000


Iteration:  43%|████▎     | 4612/10688 [1:05:23<1:29:52,  1.13it/s]

timestamp: 21/06/2020 18:57:29, average loss: 2.795399, time duration: 85.409332,
                            number of examples in current reporting: 800, step 15300
                            out of total 10000000


Iteration:  44%|████▍     | 4712/10688 [1:06:48<1:24:38,  1.18it/s]

timestamp: 21/06/2020 18:58:54, average loss: 2.714901, time duration: 84.959856,
                            number of examples in current reporting: 800, step 15400
                            out of total 10000000


Iteration:  45%|████▌     | 4812/10688 [1:08:12<1:24:54,  1.15it/s]

timestamp: 21/06/2020 19:00:18, average loss: 2.717319, time duration: 84.320523,
                            number of examples in current reporting: 800, step 15500
                            out of total 10000000


Iteration:  46%|████▌     | 4912/10688 [1:09:37<1:22:26,  1.17it/s]

timestamp: 21/06/2020 19:01:43, average loss: 2.734347, time duration: 84.358631,
                            number of examples in current reporting: 800, step 15600
                            out of total 10000000


Iteration:  47%|████▋     | 5012/10688 [1:11:01<1:19:39,  1.19it/s]

timestamp: 21/06/2020 19:03:07, average loss: 2.712254, time duration: 84.472376,
                            number of examples in current reporting: 800, step 15700
                            out of total 10000000


Iteration:  48%|████▊     | 5112/10688 [1:12:25<1:14:35,  1.25it/s]

timestamp: 21/06/2020 19:04:31, average loss: 2.721208, time duration: 84.177985,
                            number of examples in current reporting: 800, step 15800
                            out of total 10000000


Iteration:  49%|████▉     | 5212/10688 [1:13:50<1:13:45,  1.24it/s]

timestamp: 21/06/2020 19:05:56, average loss: 2.778540, time duration: 84.355381,
                            number of examples in current reporting: 800, step 15900
                            out of total 10000000


Iteration:  50%|████▉     | 5311/10688 [1:15:13<1:18:10,  1.15it/s]

timestamp: 21/06/2020 19:07:20, average loss: 2.725090, time duration: 84.083873,
                            number of examples in current reporting: 800, step 16000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  51%|█████     | 5412/10688 [1:16:42<1:11:25,  1.23it/s]

timestamp: 21/06/2020 19:08:48, average loss: 2.776366, time duration: 88.136983,
                            number of examples in current reporting: 800, step 16100
                            out of total 10000000


Iteration:  52%|█████▏    | 5512/10688 [1:18:06<1:08:55,  1.25it/s]

timestamp: 21/06/2020 19:10:12, average loss: 2.709504, time duration: 84.276710,
                            number of examples in current reporting: 800, step 16200
                            out of total 10000000


Iteration:  53%|█████▎    | 5612/10688 [1:19:31<1:13:35,  1.15it/s]

timestamp: 21/06/2020 19:11:37, average loss: 2.711296, time duration: 84.569871,
                            number of examples in current reporting: 800, step 16300
                            out of total 10000000


Iteration:  53%|█████▎    | 5712/10688 [1:20:55<1:06:52,  1.24it/s]

timestamp: 21/06/2020 19:13:01, average loss: 2.703871, time duration: 84.213474,
                            number of examples in current reporting: 800, step 16400
                            out of total 10000000


Iteration:  54%|█████▍    | 5812/10688 [1:22:18<1:07:54,  1.20it/s]

timestamp: 21/06/2020 19:14:24, average loss: 2.724369, time duration: 83.481361,
                            number of examples in current reporting: 800, step 16500
                            out of total 10000000


Iteration:  55%|█████▌    | 5912/10688 [1:23:44<1:07:54,  1.17it/s]

timestamp: 21/06/2020 19:15:50, average loss: 2.723518, time duration: 85.544976,
                            number of examples in current reporting: 800, step 16600
                            out of total 10000000


Iteration:  56%|█████▋    | 6012/10688 [1:25:09<1:05:25,  1.19it/s]

timestamp: 21/06/2020 19:17:15, average loss: 2.717811, time duration: 85.201412,
                            number of examples in current reporting: 800, step 16700
                            out of total 10000000


Iteration:  57%|█████▋    | 6112/10688 [1:26:35<1:00:43,  1.26it/s]

timestamp: 21/06/2020 19:18:41, average loss: 2.687316, time duration: 85.641775,
                            number of examples in current reporting: 800, step 16800
                            out of total 10000000


Iteration:  58%|█████▊    | 6212/10688 [1:28:01<1:07:29,  1.11it/s]

timestamp: 21/06/2020 19:20:07, average loss: 2.779090, time duration: 85.902561,
                            number of examples in current reporting: 800, step 16900
                            out of total 10000000


Iteration:  59%|█████▉    | 6311/10688 [1:29:24<58:32,  1.25it/s]  

timestamp: 21/06/2020 19:21:31, average loss: 2.735061, time duration: 84.628726,
                            number of examples in current reporting: 800, step 17000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  60%|█████▉    | 6412/10688 [1:30:53<1:03:21,  1.12it/s]

timestamp: 21/06/2020 19:22:59, average loss: 2.713546, time duration: 87.655367,
                            number of examples in current reporting: 800, step 17100
                            out of total 10000000


Iteration:  61%|██████    | 6512/10688 [1:32:17<1:00:37,  1.15it/s]

timestamp: 21/06/2020 19:24:23, average loss: 2.726183, time duration: 84.080112,
                            number of examples in current reporting: 800, step 17200
                            out of total 10000000


Iteration:  62%|██████▏   | 6612/10688 [1:33:41<56:12,  1.21it/s]  

timestamp: 21/06/2020 19:25:47, average loss: 2.730938, time duration: 84.100701,
                            number of examples in current reporting: 800, step 17300
                            out of total 10000000


Iteration:  63%|██████▎   | 6712/10688 [1:35:06<58:35,  1.13it/s]  

timestamp: 21/06/2020 19:27:12, average loss: 2.735033, time duration: 84.620475,
                            number of examples in current reporting: 800, step 17400
                            out of total 10000000


Iteration:  64%|██████▎   | 6812/10688 [1:36:29<57:18,  1.13it/s]

timestamp: 21/06/2020 19:28:35, average loss: 2.621594, time duration: 82.831934,
                            number of examples in current reporting: 800, step 17500
                            out of total 10000000


Iteration:  65%|██████▍   | 6912/10688 [1:37:53<53:33,  1.17it/s]

timestamp: 21/06/2020 19:29:59, average loss: 2.702081, time duration: 84.788420,
                            number of examples in current reporting: 800, step 17600
                            out of total 10000000


Iteration:  66%|██████▌   | 7012/10688 [1:39:17<53:52,  1.14it/s]

timestamp: 21/06/2020 19:31:24, average loss: 2.711064, time duration: 84.191604,
                            number of examples in current reporting: 800, step 17700
                            out of total 10000000


Iteration:  67%|██████▋   | 7112/10688 [1:40:41<48:46,  1.22it/s]

timestamp: 21/06/2020 19:32:47, average loss: 2.696773, time duration: 83.591737,
                            number of examples in current reporting: 800, step 17800
                            out of total 10000000


Iteration:  67%|██████▋   | 7212/10688 [1:42:06<49:03,  1.18it/s]

timestamp: 21/06/2020 19:34:12, average loss: 2.723877, time duration: 85.061502,
                            number of examples in current reporting: 800, step 17900
                            out of total 10000000


Iteration:  68%|██████▊   | 7311/10688 [1:43:31<50:59,  1.10it/s]

timestamp: 21/06/2020 19:35:38, average loss: 2.652750, time duration: 85.803969,
                            number of examples in current reporting: 800, step 18000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  69%|██████▉   | 7412/10688 [1:45:00<45:06,  1.21it/s]  

timestamp: 21/06/2020 19:37:06, average loss: 2.610580, time duration: 87.838497,
                            number of examples in current reporting: 800, step 18100
                            out of total 10000000


Iteration:  70%|███████   | 7512/10688 [1:46:24<44:17,  1.19it/s]

timestamp: 21/06/2020 19:38:30, average loss: 2.643901, time duration: 83.908080,
                            number of examples in current reporting: 800, step 18200
                            out of total 10000000


Iteration:  71%|███████   | 7612/10688 [1:47:48<42:26,  1.21it/s]

timestamp: 21/06/2020 19:39:55, average loss: 2.707614, time duration: 84.722857,
                            number of examples in current reporting: 800, step 18300
                            out of total 10000000


Iteration:  72%|███████▏  | 7712/10688 [1:49:13<41:01,  1.21it/s]

timestamp: 21/06/2020 19:41:20, average loss: 2.634478, time duration: 84.913711,
                            number of examples in current reporting: 800, step 18400
                            out of total 10000000


Iteration:  73%|███████▎  | 7812/10688 [1:50:38<41:23,  1.16it/s]

timestamp: 21/06/2020 19:42:44, average loss: 2.671025, time duration: 84.949089,
                            number of examples in current reporting: 800, step 18500
                            out of total 10000000


Iteration:  74%|███████▍  | 7912/10688 [1:52:03<38:29,  1.20it/s]

timestamp: 21/06/2020 19:44:10, average loss: 2.684195, time duration: 85.154062,
                            number of examples in current reporting: 800, step 18600
                            out of total 10000000


Iteration:  75%|███████▍  | 8012/10688 [1:53:28<41:07,  1.08it/s]

timestamp: 21/06/2020 19:45:34, average loss: 2.667479, time duration: 84.675082,
                            number of examples in current reporting: 800, step 18700
                            out of total 10000000


Iteration:  76%|███████▌  | 8112/10688 [1:54:52<35:47,  1.20it/s]

timestamp: 21/06/2020 19:46:59, average loss: 2.628100, time duration: 84.335264,
                            number of examples in current reporting: 800, step 18800
                            out of total 10000000


Iteration:  77%|███████▋  | 8212/10688 [1:56:18<37:34,  1.10it/s]

timestamp: 21/06/2020 19:48:24, average loss: 2.654224, time duration: 85.760911,
                            number of examples in current reporting: 800, step 18900
                            out of total 10000000


Iteration:  78%|███████▊  | 8311/10688 [1:57:43<35:53,  1.10it/s]

timestamp: 21/06/2020 19:49:50, average loss: 2.682046, time duration: 85.357006,
                            number of examples in current reporting: 800, step 19000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  79%|███████▊  | 8412/10688 [1:59:12<32:55,  1.15it/s]  

timestamp: 21/06/2020 19:51:18, average loss: 2.631992, time duration: 88.596542,
                            number of examples in current reporting: 800, step 19100
                            out of total 10000000


Iteration:  80%|███████▉  | 8512/10688 [2:00:38<32:31,  1.11it/s]

timestamp: 21/06/2020 19:52:44, average loss: 2.690182, time duration: 85.734212,
                            number of examples in current reporting: 800, step 19200
                            out of total 10000000


Iteration:  81%|████████  | 8612/10688 [2:02:03<30:22,  1.14it/s]

timestamp: 21/06/2020 19:54:09, average loss: 2.607163, time duration: 85.283953,
                            number of examples in current reporting: 800, step 19300
                            out of total 10000000


Iteration:  82%|████████▏ | 8712/10688 [2:03:29<30:36,  1.08it/s]

timestamp: 21/06/2020 19:55:35, average loss: 2.676511, time duration: 85.661181,
                            number of examples in current reporting: 800, step 19400
                            out of total 10000000


Iteration:  82%|████████▏ | 8812/10688 [2:04:53<25:22,  1.23it/s]

timestamp: 21/06/2020 19:57:00, average loss: 2.632865, time duration: 84.549492,
                            number of examples in current reporting: 800, step 19500
                            out of total 10000000


Iteration:  83%|████████▎ | 8912/10688 [2:06:19<24:56,  1.19it/s]

timestamp: 21/06/2020 19:58:25, average loss: 2.722771, time duration: 85.750759,
                            number of examples in current reporting: 800, step 19600
                            out of total 10000000


Iteration:  84%|████████▍ | 9012/10688 [2:07:46<24:48,  1.13it/s]

timestamp: 21/06/2020 19:59:52, average loss: 2.623039, time duration: 86.803579,
                            number of examples in current reporting: 800, step 19700
                            out of total 10000000


Iteration:  85%|████████▌ | 9112/10688 [2:09:10<23:19,  1.13it/s]

timestamp: 21/06/2020 20:01:17, average loss: 2.628073, time duration: 84.528291,
                            number of examples in current reporting: 800, step 19800
                            out of total 10000000


Iteration:  86%|████████▌ | 9212/10688 [2:10:34<19:42,  1.25it/s]

timestamp: 21/06/2020 20:02:40, average loss: 2.699287, time duration: 83.352118,
                            number of examples in current reporting: 800, step 19900
                            out of total 10000000


Iteration:  87%|████████▋ | 9311/10688 [2:11:57<20:12,  1.14it/s]

timestamp: 21/06/2020 20:04:04, average loss: 2.590660, time duration: 84.304861,
                            number of examples in current reporting: 800, step 20000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  88%|████████▊ | 9412/10688 [2:13:25<18:16,  1.16it/s]

timestamp: 21/06/2020 20:05:31, average loss: 2.621717, time duration: 87.143769,
                            number of examples in current reporting: 800, step 20100
                            out of total 10000000


Iteration:  89%|████████▉ | 9512/10688 [2:14:50<16:48,  1.17it/s]

timestamp: 21/06/2020 20:06:56, average loss: 2.608444, time duration: 84.458990,
                            number of examples in current reporting: 800, step 20200
                            out of total 10000000


Iteration:  90%|████████▉ | 9612/10688 [2:16:15<15:02,  1.19it/s]

timestamp: 21/06/2020 20:08:21, average loss: 2.592614, time duration: 85.011194,
                            number of examples in current reporting: 800, step 20300
                            out of total 10000000


Iteration:  91%|█████████ | 9712/10688 [2:17:40<13:48,  1.18it/s]

timestamp: 21/06/2020 20:09:47, average loss: 2.616524, time duration: 85.685039,
                            number of examples in current reporting: 800, step 20400
                            out of total 10000000


Iteration:  92%|█████████▏| 9812/10688 [2:19:10<12:55,  1.13it/s]

timestamp: 21/06/2020 20:11:16, average loss: 2.619779, time duration: 89.851303,
                            number of examples in current reporting: 800, step 20500
                            out of total 10000000


Iteration:  93%|█████████▎| 9912/10688 [2:20:35<10:49,  1.19it/s]

timestamp: 21/06/2020 20:12:42, average loss: 2.665796, time duration: 85.140341,
                            number of examples in current reporting: 800, step 20600
                            out of total 10000000


Iteration:  94%|█████████▎| 10012/10688 [2:22:00<09:58,  1.13it/s]

timestamp: 21/06/2020 20:14:06, average loss: 2.632045, time duration: 84.621323,
                            number of examples in current reporting: 800, step 20700
                            out of total 10000000


Iteration:  95%|█████████▍| 10112/10688 [2:23:24<07:48,  1.23it/s]

timestamp: 21/06/2020 20:15:30, average loss: 2.582344, time duration: 83.680613,
                            number of examples in current reporting: 800, step 20800
                            out of total 10000000


Iteration:  96%|█████████▌| 10212/10688 [2:24:48<06:29,  1.22it/s]

timestamp: 21/06/2020 20:16:54, average loss: 2.619448, time duration: 84.156378,
                            number of examples in current reporting: 800, step 20900
                            out of total 10000000


Iteration:  96%|█████████▋| 10311/10688 [2:26:13<05:52,  1.07it/s]

timestamp: 21/06/2020 20:18:20, average loss: 2.656733, time duration: 85.619319,
                            number of examples in current reporting: 800, step 21000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  97%|█████████▋| 10412/10688 [2:27:42<03:41,  1.25it/s]

timestamp: 21/06/2020 20:19:48, average loss: 2.590177, time duration: 88.602079,
                            number of examples in current reporting: 800, step 21100
                            out of total 10000000


Iteration:  98%|█████████▊| 10512/10688 [2:29:08<02:33,  1.14it/s]

timestamp: 21/06/2020 20:21:14, average loss: 2.645581, time duration: 85.502393,
                            number of examples in current reporting: 800, step 21200
                            out of total 10000000


Iteration:  99%|█████████▉| 10612/10688 [2:30:32<01:02,  1.21it/s]

timestamp: 21/06/2020 20:22:38, average loss: 2.573170, time duration: 84.132354,
                            number of examples in current reporting: 800, step 21300
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:37<00:00,  1.17it/s]
Iteration:   0%|          | 24/10688 [00:20<2:35:48,  1.14it/s]

timestamp: 21/06/2020 20:24:04, average loss: 2.570420, time duration: 85.587879,
                            number of examples in current reporting: 796, step 21400
                            out of total 10000000


Iteration:   1%|          | 124/10688 [01:45<2:18:24,  1.27it/s]

timestamp: 21/06/2020 20:25:29, average loss: 2.526835, time duration: 85.330886,
                            number of examples in current reporting: 800, step 21500
                            out of total 10000000


Iteration:   2%|▏         | 224/10688 [03:12<2:43:16,  1.07it/s]

timestamp: 21/06/2020 20:26:55, average loss: 2.487799, time duration: 86.425371,
                            number of examples in current reporting: 800, step 21600
                            out of total 10000000


Iteration:   3%|▎         | 324/10688 [04:39<2:30:45,  1.15it/s]

timestamp: 21/06/2020 20:28:22, average loss: 2.529675, time duration: 86.869046,
                            number of examples in current reporting: 800, step 21700
                            out of total 10000000


Iteration:   4%|▍         | 424/10688 [06:04<2:21:21,  1.21it/s]

timestamp: 21/06/2020 20:29:47, average loss: 2.528082, time duration: 85.177375,
                            number of examples in current reporting: 800, step 21800
                            out of total 10000000


Iteration:   5%|▍         | 524/10688 [07:30<2:25:17,  1.17it/s]

timestamp: 21/06/2020 20:31:13, average loss: 2.499078, time duration: 85.666011,
                            number of examples in current reporting: 800, step 21900
                            out of total 10000000


Iteration:   6%|▌         | 623/10688 [08:54<2:29:01,  1.13it/s]

timestamp: 21/06/2020 20:32:38, average loss: 2.508328, time duration: 84.904067,
                            number of examples in current reporting: 800, step 22000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:   7%|▋         | 724/10688 [10:22<2:23:34,  1.16it/s]

timestamp: 21/06/2020 20:34:06, average loss: 2.524562, time duration: 87.948850,
                            number of examples in current reporting: 800, step 22100
                            out of total 10000000


Iteration:   8%|▊         | 824/10688 [11:47<2:26:26,  1.12it/s]

timestamp: 21/06/2020 20:35:31, average loss: 2.522268, time duration: 84.760322,
                            number of examples in current reporting: 800, step 22200
                            out of total 10000000


Iteration:   9%|▊         | 924/10688 [13:14<2:22:39,  1.14it/s]

timestamp: 21/06/2020 20:36:57, average loss: 2.567595, time duration: 86.423012,
                            number of examples in current reporting: 800, step 22300
                            out of total 10000000


Iteration:  10%|▉         | 1024/10688 [14:40<2:18:51,  1.16it/s]

timestamp: 21/06/2020 20:38:24, average loss: 2.580429, time duration: 86.811834,
                            number of examples in current reporting: 800, step 22400
                            out of total 10000000


Iteration:  11%|█         | 1124/10688 [16:06<2:18:41,  1.15it/s]

timestamp: 21/06/2020 20:39:49, average loss: 2.440320, time duration: 85.099608,
                            number of examples in current reporting: 800, step 22500
                            out of total 10000000


Iteration:  11%|█▏        | 1224/10688 [17:30<2:16:44,  1.15it/s]

timestamp: 21/06/2020 20:41:14, average loss: 2.551640, time duration: 84.945274,
                            number of examples in current reporting: 800, step 22600
                            out of total 10000000


Iteration:  12%|█▏        | 1324/10688 [18:56<2:12:17,  1.18it/s]

timestamp: 21/06/2020 20:42:39, average loss: 2.496512, time duration: 85.490177,
                            number of examples in current reporting: 800, step 22700
                            out of total 10000000


Iteration:  13%|█▎        | 1424/10688 [20:21<2:08:53,  1.20it/s]

timestamp: 21/06/2020 20:44:04, average loss: 2.511013, time duration: 84.778677,
                            number of examples in current reporting: 800, step 22800
                            out of total 10000000


Iteration:  14%|█▍        | 1524/10688 [21:46<2:12:03,  1.16it/s]

timestamp: 21/06/2020 20:45:30, average loss: 2.537477, time duration: 85.718889,
                            number of examples in current reporting: 800, step 22900
                            out of total 10000000


Iteration:  15%|█▌        | 1623/10688 [23:11<2:08:59,  1.17it/s]

timestamp: 21/06/2020 20:46:55, average loss: 2.465832, time duration: 85.587128,
                            number of examples in current reporting: 800, step 23000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  16%|█▌        | 1724/10688 [24:41<2:01:32,  1.23it/s]

timestamp: 21/06/2020 20:48:25, average loss: 2.427743, time duration: 89.082573,
                            number of examples in current reporting: 800, step 23100
                            out of total 10000000


Iteration:  17%|█▋        | 1824/10688 [26:06<2:02:04,  1.21it/s]

timestamp: 21/06/2020 20:49:50, average loss: 2.470236, time duration: 85.286592,
                            number of examples in current reporting: 800, step 23200
                            out of total 10000000


Iteration:  18%|█▊        | 1924/10688 [27:32<2:04:19,  1.17it/s]

timestamp: 21/06/2020 20:51:15, average loss: 2.502007, time duration: 85.201343,
                            number of examples in current reporting: 800, step 23300
                            out of total 10000000


Iteration:  19%|█▉        | 2024/10688 [28:56<2:00:45,  1.20it/s]

timestamp: 21/06/2020 20:52:39, average loss: 2.464475, time duration: 84.163733,
                            number of examples in current reporting: 800, step 23400
                            out of total 10000000


Iteration:  20%|█▉        | 2124/10688 [30:21<2:07:24,  1.12it/s]

timestamp: 21/06/2020 20:54:04, average loss: 2.530646, time duration: 84.984980,
                            number of examples in current reporting: 800, step 23500
                            out of total 10000000


Iteration:  21%|██        | 2224/10688 [31:45<2:00:35,  1.17it/s]

timestamp: 21/06/2020 20:55:29, average loss: 2.486581, time duration: 84.455717,
                            number of examples in current reporting: 800, step 23600
                            out of total 10000000


Iteration:  22%|██▏       | 2324/10688 [33:09<2:04:24,  1.12it/s]

timestamp: 21/06/2020 20:56:52, average loss: 2.490522, time duration: 83.515179,
                            number of examples in current reporting: 800, step 23700
                            out of total 10000000


Iteration:  23%|██▎       | 2424/10688 [34:35<1:57:07,  1.18it/s]

timestamp: 21/06/2020 20:58:18, average loss: 2.506614, time duration: 85.814375,
                            number of examples in current reporting: 800, step 23800
                            out of total 10000000


Iteration:  24%|██▎       | 2524/10688 [36:00<1:56:25,  1.17it/s]

timestamp: 21/06/2020 20:59:44, average loss: 2.535039, time duration: 85.916771,
                            number of examples in current reporting: 800, step 23900
                            out of total 10000000


Iteration:  25%|██▍       | 2623/10688 [37:24<2:03:41,  1.09it/s]

timestamp: 21/06/2020 21:01:08, average loss: 2.466983, time duration: 84.015930,
                            number of examples in current reporting: 800, step 24000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  25%|██▌       | 2724/10688 [38:53<1:51:46,  1.19it/s]

timestamp: 21/06/2020 21:02:36, average loss: 2.458555, time duration: 88.190394,
                            number of examples in current reporting: 800, step 24100
                            out of total 10000000


Iteration:  26%|██▋       | 2824/10688 [40:17<1:51:49,  1.17it/s]

timestamp: 21/06/2020 21:04:01, average loss: 2.506809, time duration: 84.690728,
                            number of examples in current reporting: 800, step 24200
                            out of total 10000000


Iteration:  27%|██▋       | 2924/10688 [41:41<1:49:03,  1.19it/s]

timestamp: 21/06/2020 21:05:24, average loss: 2.519453, time duration: 83.607454,
                            number of examples in current reporting: 800, step 24300
                            out of total 10000000


Iteration:  28%|██▊       | 3024/10688 [43:05<1:48:04,  1.18it/s]

timestamp: 21/06/2020 21:06:49, average loss: 2.575265, time duration: 84.360708,
                            number of examples in current reporting: 800, step 24400
                            out of total 10000000


Iteration:  29%|██▉       | 3124/10688 [44:29<1:40:36,  1.25it/s]

timestamp: 21/06/2020 21:08:13, average loss: 2.494541, time duration: 83.811369,
                            number of examples in current reporting: 800, step 24500
                            out of total 10000000


Iteration:  30%|███       | 3224/10688 [45:53<1:48:55,  1.14it/s]

timestamp: 21/06/2020 21:09:36, average loss: 2.495693, time duration: 83.725361,
                            number of examples in current reporting: 800, step 24600
                            out of total 10000000


Iteration:  31%|███       | 3324/10688 [47:16<1:48:20,  1.13it/s]

timestamp: 21/06/2020 21:11:00, average loss: 2.517869, time duration: 83.567018,
                            number of examples in current reporting: 800, step 24700
                            out of total 10000000


Iteration:  32%|███▏      | 3424/10688 [48:42<1:42:36,  1.18it/s]

timestamp: 21/06/2020 21:12:26, average loss: 2.478224, time duration: 85.746797,
                            number of examples in current reporting: 800, step 24800
                            out of total 10000000


Iteration:  33%|███▎      | 3524/10688 [50:06<1:41:32,  1.18it/s]

timestamp: 21/06/2020 21:13:49, average loss: 2.449308, time duration: 83.760131,
                            number of examples in current reporting: 800, step 24900
                            out of total 10000000


Iteration:  34%|███▍      | 3623/10688 [51:29<1:43:10,  1.14it/s]

timestamp: 21/06/2020 21:15:14, average loss: 2.524291, time duration: 84.220643,
                            number of examples in current reporting: 800, step 25000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  35%|███▍      | 3724/10688 [52:57<1:41:54,  1.14it/s]

timestamp: 21/06/2020 21:16:41, average loss: 2.456483, time duration: 87.080911,
                            number of examples in current reporting: 800, step 25100
                            out of total 10000000


Iteration:  36%|███▌      | 3824/10688 [54:23<1:47:38,  1.06it/s]

timestamp: 21/06/2020 21:18:06, average loss: 2.461060, time duration: 85.702671,
                            number of examples in current reporting: 800, step 25200
                            out of total 10000000


Iteration:  37%|███▋      | 3924/10688 [55:48<1:32:21,  1.22it/s]

timestamp: 21/06/2020 21:19:32, average loss: 2.474454, time duration: 85.152144,
                            number of examples in current reporting: 800, step 25300
                            out of total 10000000


Iteration:  38%|███▊      | 4024/10688 [57:13<1:37:15,  1.14it/s]

timestamp: 21/06/2020 21:20:56, average loss: 2.492758, time duration: 84.819354,
                            number of examples in current reporting: 800, step 25400
                            out of total 10000000


Iteration:  39%|███▊      | 4124/10688 [58:38<1:35:31,  1.15it/s]

timestamp: 21/06/2020 21:22:21, average loss: 2.511429, time duration: 84.658543,
                            number of examples in current reporting: 800, step 25500
                            out of total 10000000


Iteration:  40%|███▉      | 4224/10688 [1:00:03<1:33:05,  1.16it/s]

timestamp: 21/06/2020 21:23:46, average loss: 2.560200, time duration: 85.477604,
                            number of examples in current reporting: 800, step 25600
                            out of total 10000000


Iteration:  40%|████      | 4324/10688 [1:01:28<1:29:58,  1.18it/s]

timestamp: 21/06/2020 21:25:11, average loss: 2.488878, time duration: 84.492754,
                            number of examples in current reporting: 800, step 25700
                            out of total 10000000


Iteration:  41%|████▏     | 4424/10688 [1:02:52<1:28:56,  1.17it/s]

timestamp: 21/06/2020 21:26:35, average loss: 2.486056, time duration: 84.513978,
                            number of examples in current reporting: 800, step 25800
                            out of total 10000000


Iteration:  42%|████▏     | 4524/10688 [1:04:16<1:20:13,  1.28it/s]

timestamp: 21/06/2020 21:28:00, average loss: 2.500905, time duration: 84.314529,
                            number of examples in current reporting: 800, step 25900
                            out of total 10000000


Iteration:  43%|████▎     | 4623/10688 [1:05:39<1:26:27,  1.17it/s]

timestamp: 21/06/2020 21:29:23, average loss: 2.460406, time duration: 82.999093,
                            number of examples in current reporting: 800, step 26000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  44%|████▍     | 4724/10688 [1:07:06<1:29:59,  1.10it/s]

timestamp: 21/06/2020 21:30:50, average loss: 2.503427, time duration: 86.951015,
                            number of examples in current reporting: 800, step 26100
                            out of total 10000000


Iteration:  45%|████▌     | 4824/10688 [1:08:29<1:16:21,  1.28it/s]

timestamp: 21/06/2020 21:32:13, average loss: 2.487526, time duration: 82.853602,
                            number of examples in current reporting: 800, step 26200
                            out of total 10000000


Iteration:  46%|████▌     | 4924/10688 [1:09:52<1:17:18,  1.24it/s]

timestamp: 21/06/2020 21:33:35, average loss: 2.481666, time duration: 82.442348,
                            number of examples in current reporting: 800, step 26300
                            out of total 10000000


Iteration:  47%|████▋     | 5024/10688 [1:11:15<1:24:49,  1.11it/s]

timestamp: 21/06/2020 21:34:58, average loss: 2.508181, time duration: 83.460258,
                            number of examples in current reporting: 800, step 26400
                            out of total 10000000


Iteration:  48%|████▊     | 5124/10688 [1:12:38<1:18:28,  1.18it/s]

timestamp: 21/06/2020 21:36:21, average loss: 2.474461, time duration: 82.941521,
                            number of examples in current reporting: 800, step 26500
                            out of total 10000000


Iteration:  49%|████▉     | 5224/10688 [1:14:01<1:14:06,  1.23it/s]

timestamp: 21/06/2020 21:37:45, average loss: 2.517299, time duration: 83.412009,
                            number of examples in current reporting: 800, step 26600
                            out of total 10000000


Iteration:  50%|████▉     | 5324/10688 [1:15:25<1:17:52,  1.15it/s]

timestamp: 21/06/2020 21:39:08, average loss: 2.464327, time duration: 83.215774,
                            number of examples in current reporting: 800, step 26700
                            out of total 10000000


Iteration:  51%|█████     | 5424/10688 [1:16:49<1:12:01,  1.22it/s]

timestamp: 21/06/2020 21:40:32, average loss: 2.511351, time duration: 84.042882,
                            number of examples in current reporting: 800, step 26800
                            out of total 10000000


Iteration:  52%|█████▏    | 5524/10688 [1:18:13<1:15:26,  1.14it/s]

timestamp: 21/06/2020 21:41:56, average loss: 2.496342, time duration: 83.960845,
                            number of examples in current reporting: 800, step 26900
                            out of total 10000000


Iteration:  53%|█████▎    | 5623/10688 [1:19:36<1:10:42,  1.19it/s]

timestamp: 21/06/2020 21:43:20, average loss: 2.435356, time duration: 83.917904,
                            number of examples in current reporting: 800, step 27000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  54%|█████▎    | 5724/10688 [1:21:04<1:09:44,  1.19it/s]

timestamp: 21/06/2020 21:44:47, average loss: 2.491599, time duration: 87.205815,
                            number of examples in current reporting: 800, step 27100
                            out of total 10000000


Iteration:  54%|█████▍    | 5824/10688 [1:22:28<1:08:10,  1.19it/s]

timestamp: 21/06/2020 21:46:11, average loss: 2.504293, time duration: 83.992459,
                            number of examples in current reporting: 800, step 27200
                            out of total 10000000


Iteration:  55%|█████▌    | 5924/10688 [1:23:52<1:08:36,  1.16it/s]

timestamp: 21/06/2020 21:47:35, average loss: 2.436053, time duration: 83.938356,
                            number of examples in current reporting: 800, step 27300
                            out of total 10000000


Iteration:  56%|█████▋    | 6024/10688 [1:25:17<1:02:20,  1.25it/s]

timestamp: 21/06/2020 21:49:00, average loss: 2.487062, time duration: 85.161288,
                            number of examples in current reporting: 800, step 27400
                            out of total 10000000


Iteration:  57%|█████▋    | 6124/10688 [1:26:41<1:06:18,  1.15it/s]

timestamp: 21/06/2020 21:50:25, average loss: 2.486623, time duration: 84.500480,
                            number of examples in current reporting: 800, step 27500
                            out of total 10000000


Iteration:  58%|█████▊    | 6224/10688 [1:28:05<1:02:27,  1.19it/s]

timestamp: 21/06/2020 21:51:48, average loss: 2.468133, time duration: 83.720033,
                            number of examples in current reporting: 800, step 27600
                            out of total 10000000


Iteration:  59%|█████▉    | 6324/10688 [1:29:29<1:03:16,  1.15it/s]

timestamp: 21/06/2020 21:53:13, average loss: 2.429462, time duration: 84.268177,
                            number of examples in current reporting: 800, step 27700
                            out of total 10000000


Iteration:  60%|██████    | 6424/10688 [1:30:54<58:15,  1.22it/s]  

timestamp: 21/06/2020 21:54:38, average loss: 2.457113, time duration: 85.000114,
                            number of examples in current reporting: 800, step 27800
                            out of total 10000000


Iteration:  61%|██████    | 6524/10688 [1:32:18<1:03:24,  1.09it/s]

timestamp: 21/06/2020 21:56:02, average loss: 2.465309, time duration: 83.790182,
                            number of examples in current reporting: 800, step 27900
                            out of total 10000000


Iteration:  62%|██████▏   | 6623/10688 [1:33:41<55:16,  1.23it/s]  

timestamp: 21/06/2020 21:57:25, average loss: 2.456080, time duration: 83.493096,
                            number of examples in current reporting: 800, step 28000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  63%|██████▎   | 6724/10688 [1:35:09<59:53,  1.10it/s]  

timestamp: 21/06/2020 21:58:52, average loss: 2.475887, time duration: 87.382383,
                            number of examples in current reporting: 800, step 28100
                            out of total 10000000


Iteration:  64%|██████▍   | 6824/10688 [1:36:32<52:46,  1.22it/s]  

timestamp: 21/06/2020 22:00:15, average loss: 2.447535, time duration: 82.595396,
                            number of examples in current reporting: 800, step 28200
                            out of total 10000000


Iteration:  65%|██████▍   | 6924/10688 [1:37:55<56:09,  1.12it/s]

timestamp: 21/06/2020 22:01:39, average loss: 2.459843, time duration: 83.685498,
                            number of examples in current reporting: 800, step 28300
                            out of total 10000000


Iteration:  66%|██████▌   | 7024/10688 [1:39:18<53:54,  1.13it/s]

timestamp: 21/06/2020 22:03:02, average loss: 2.483015, time duration: 82.921607,
                            number of examples in current reporting: 800, step 28400
                            out of total 10000000


Iteration:  67%|██████▋   | 7124/10688 [1:40:42<45:55,  1.29it/s]

timestamp: 21/06/2020 22:04:26, average loss: 2.373428, time duration: 83.926084,
                            number of examples in current reporting: 800, step 28500
                            out of total 10000000


Iteration:  68%|██████▊   | 7224/10688 [1:42:09<50:16,  1.15it/s]

timestamp: 21/06/2020 22:05:52, average loss: 2.481286, time duration: 86.918741,
                            number of examples in current reporting: 800, step 28600
                            out of total 10000000


Iteration:  69%|██████▊   | 7324/10688 [1:43:37<51:17,  1.09it/s]

timestamp: 21/06/2020 22:07:20, average loss: 2.495596, time duration: 87.539367,
                            number of examples in current reporting: 800, step 28700
                            out of total 10000000


Iteration:  69%|██████▉   | 7424/10688 [1:45:02<44:17,  1.23it/s]

timestamp: 21/06/2020 22:08:45, average loss: 2.456701, time duration: 85.226496,
                            number of examples in current reporting: 800, step 28800
                            out of total 10000000


Iteration:  70%|███████   | 7524/10688 [1:46:25<44:48,  1.18it/s]

timestamp: 21/06/2020 22:10:09, average loss: 2.436898, time duration: 83.419364,
                            number of examples in current reporting: 800, step 28900
                            out of total 10000000


Iteration:  71%|███████▏  | 7623/10688 [1:47:50<44:28,  1.15it/s]

timestamp: 21/06/2020 22:11:34, average loss: 2.509859, time duration: 85.737887,
                            number of examples in current reporting: 800, step 29000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  72%|███████▏  | 7724/10688 [1:49:19<40:46,  1.21it/s]  

timestamp: 21/06/2020 22:13:02, average loss: 2.462952, time duration: 87.500593,
                            number of examples in current reporting: 800, step 29100
                            out of total 10000000


Iteration:  73%|███████▎  | 7824/10688 [1:50:43<41:56,  1.14it/s]

timestamp: 21/06/2020 22:14:26, average loss: 2.486795, time duration: 84.002934,
                            number of examples in current reporting: 800, step 29200
                            out of total 10000000


Iteration:  74%|███████▍  | 7924/10688 [1:52:06<37:54,  1.22it/s]

timestamp: 21/06/2020 22:15:50, average loss: 2.443754, time duration: 83.956927,
                            number of examples in current reporting: 800, step 29300
                            out of total 10000000


Iteration:  75%|███████▌  | 8024/10688 [1:53:31<34:40,  1.28it/s]

timestamp: 21/06/2020 22:17:14, average loss: 2.427903, time duration: 84.144092,
                            number of examples in current reporting: 800, step 29400
                            out of total 10000000


Iteration:  76%|███████▌  | 8124/10688 [1:54:54<36:21,  1.18it/s]

timestamp: 21/06/2020 22:18:38, average loss: 2.395503, time duration: 83.827997,
                            number of examples in current reporting: 800, step 29500
                            out of total 10000000


Iteration:  77%|███████▋  | 8224/10688 [1:56:18<36:22,  1.13it/s]

timestamp: 21/06/2020 22:20:01, average loss: 2.449833, time duration: 83.617364,
                            number of examples in current reporting: 800, step 29600
                            out of total 10000000


Iteration:  78%|███████▊  | 8324/10688 [1:57:42<31:41,  1.24it/s]

timestamp: 21/06/2020 22:21:26, average loss: 2.446927, time duration: 84.056325,
                            number of examples in current reporting: 800, step 29700
                            out of total 10000000


Iteration:  79%|███████▉  | 8424/10688 [1:59:07<33:00,  1.14it/s]

timestamp: 21/06/2020 22:22:50, average loss: 2.474689, time duration: 84.992836,
                            number of examples in current reporting: 800, step 29800
                            out of total 10000000


Iteration:  80%|███████▉  | 8524/10688 [2:00:32<28:25,  1.27it/s]

timestamp: 21/06/2020 22:24:15, average loss: 2.439420, time duration: 84.708083,
                            number of examples in current reporting: 800, step 29900
                            out of total 10000000


Iteration:  81%|████████  | 8623/10688 [2:01:56<28:22,  1.21it/s]

timestamp: 21/06/2020 22:25:41, average loss: 2.442190, time duration: 85.633194,
                            number of examples in current reporting: 800, step 30000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  82%|████████▏ | 8724/10688 [2:03:25<26:54,  1.22it/s]  

timestamp: 21/06/2020 22:27:08, average loss: 2.439210, time duration: 87.575520,
                            number of examples in current reporting: 800, step 30100
                            out of total 10000000


Iteration:  83%|████████▎ | 8824/10688 [2:04:49<27:33,  1.13it/s]

timestamp: 21/06/2020 22:28:33, average loss: 2.451153, time duration: 84.309489,
                            number of examples in current reporting: 800, step 30200
                            out of total 10000000


Iteration:  83%|████████▎ | 8924/10688 [2:06:14<25:51,  1.14it/s]

timestamp: 21/06/2020 22:29:58, average loss: 2.433116, time duration: 84.876940,
                            number of examples in current reporting: 800, step 30300
                            out of total 10000000


Iteration:  84%|████████▍ | 9024/10688 [2:07:41<22:53,  1.21it/s]

timestamp: 21/06/2020 22:31:24, average loss: 2.438382, time duration: 86.585342,
                            number of examples in current reporting: 800, step 30400
                            out of total 10000000


Iteration:  85%|████████▌ | 9124/10688 [2:09:06<23:21,  1.12it/s]

timestamp: 21/06/2020 22:32:49, average loss: 2.494403, time duration: 84.731213,
                            number of examples in current reporting: 800, step 30500
                            out of total 10000000


Iteration:  86%|████████▋ | 9224/10688 [2:10:28<21:30,  1.13it/s]

timestamp: 21/06/2020 22:34:12, average loss: 2.441319, time duration: 82.949082,
                            number of examples in current reporting: 800, step 30600
                            out of total 10000000


Iteration:  87%|████████▋ | 9324/10688 [2:11:52<18:09,  1.25it/s]

timestamp: 21/06/2020 22:35:35, average loss: 2.473971, time duration: 83.454679,
                            number of examples in current reporting: 800, step 30700
                            out of total 10000000


Iteration:  88%|████████▊ | 9424/10688 [2:13:16<18:31,  1.14it/s]

timestamp: 21/06/2020 22:36:59, average loss: 2.517826, time duration: 83.955927,
                            number of examples in current reporting: 800, step 30800
                            out of total 10000000


Iteration:  89%|████████▉ | 9524/10688 [2:14:40<16:26,  1.18it/s]

timestamp: 21/06/2020 22:38:23, average loss: 2.437356, time duration: 84.120193,
                            number of examples in current reporting: 800, step 30900
                            out of total 10000000


Iteration:  90%|█████████ | 9623/10688 [2:16:04<15:00,  1.18it/s]

timestamp: 21/06/2020 22:39:48, average loss: 2.421666, time duration: 84.572402,
                            number of examples in current reporting: 800, step 31000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  91%|█████████ | 9724/10688 [2:17:33<13:34,  1.18it/s]

timestamp: 21/06/2020 22:41:16, average loss: 2.492947, time duration: 88.449464,
                            number of examples in current reporting: 800, step 31100
                            out of total 10000000


Iteration:  92%|█████████▏| 9824/10688 [2:18:59<11:43,  1.23it/s]

timestamp: 21/06/2020 22:42:42, average loss: 2.467021, time duration: 86.003814,
                            number of examples in current reporting: 800, step 31200
                            out of total 10000000


Iteration:  93%|█████████▎| 9924/10688 [2:20:23<10:33,  1.21it/s]

timestamp: 21/06/2020 22:44:06, average loss: 2.460762, time duration: 84.044976,
                            number of examples in current reporting: 800, step 31300
                            out of total 10000000


Iteration:  94%|█████████▍| 10024/10688 [2:21:48<09:10,  1.21it/s]

timestamp: 21/06/2020 22:45:32, average loss: 2.477708, time duration: 85.139578,
                            number of examples in current reporting: 800, step 31400
                            out of total 10000000


Iteration:  95%|█████████▍| 10124/10688 [2:23:12<08:25,  1.12it/s]

timestamp: 21/06/2020 22:46:55, average loss: 2.388995, time duration: 83.564590,
                            number of examples in current reporting: 800, step 31500
                            out of total 10000000


Iteration:  96%|█████████▌| 10224/10688 [2:24:39<06:49,  1.13it/s]

timestamp: 21/06/2020 22:48:22, average loss: 2.411320, time duration: 86.845581,
                            number of examples in current reporting: 800, step 31600
                            out of total 10000000


Iteration:  97%|█████████▋| 10324/10688 [2:26:04<05:14,  1.16it/s]

timestamp: 21/06/2020 22:49:47, average loss: 2.393788, time duration: 85.396627,
                            number of examples in current reporting: 800, step 31700
                            out of total 10000000


Iteration:  98%|█████████▊| 10424/10688 [2:27:30<03:58,  1.11it/s]

timestamp: 21/06/2020 22:51:13, average loss: 2.413215, time duration: 85.570926,
                            number of examples in current reporting: 800, step 31800
                            out of total 10000000


Iteration:  98%|█████████▊| 10524/10688 [2:28:54<02:13,  1.23it/s]

timestamp: 21/06/2020 22:52:37, average loss: 2.383495, time duration: 83.986770,
                            number of examples in current reporting: 800, step 31900
                            out of total 10000000


Iteration:  99%|█████████▉| 10623/10688 [2:30:18<00:55,  1.18it/s]

timestamp: 21/06/2020 22:54:02, average loss: 2.407015, time duration: 85.115900,
                            number of examples in current reporting: 800, step 32000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration: 100%|██████████| 10688/10688 [2:31:16<00:00,  1.18it/s]
Iteration:   0%|          | 36/10688 [00:30<2:34:04,  1.15it/s]

timestamp: 21/06/2020 22:55:30, average loss: 2.430330, time duration: 88.194683,
                            number of examples in current reporting: 796, step 32100
                            out of total 10000000


Iteration:   1%|▏         | 136/10688 [01:54<2:32:17,  1.15it/s]

timestamp: 21/06/2020 22:56:54, average loss: 2.314351, time duration: 83.575582,
                            number of examples in current reporting: 800, step 32200
                            out of total 10000000


Iteration:   2%|▏         | 236/10688 [03:18<2:21:17,  1.23it/s]

timestamp: 21/06/2020 22:58:18, average loss: 2.329248, time duration: 84.399785,
                            number of examples in current reporting: 800, step 32300
                            out of total 10000000


Iteration:   3%|▎         | 336/10688 [04:43<2:26:20,  1.18it/s]

timestamp: 21/06/2020 22:59:43, average loss: 2.315874, time duration: 84.490288,
                            number of examples in current reporting: 800, step 32400
                            out of total 10000000


Iteration:   4%|▍         | 436/10688 [06:10<2:20:02,  1.22it/s]

timestamp: 21/06/2020 23:01:10, average loss: 2.327760, time duration: 87.131066,
                            number of examples in current reporting: 800, step 32500
                            out of total 10000000


Iteration:   5%|▌         | 536/10688 [07:34<2:31:11,  1.12it/s]

timestamp: 21/06/2020 23:02:34, average loss: 2.312927, time duration: 84.370171,
                            number of examples in current reporting: 800, step 32600
                            out of total 10000000


Iteration:   6%|▌         | 636/10688 [08:58<2:26:28,  1.14it/s]

timestamp: 21/06/2020 23:03:58, average loss: 2.324959, time duration: 83.405778,
                            number of examples in current reporting: 800, step 32700
                            out of total 10000000


Iteration:   7%|▋         | 736/10688 [10:21<2:12:31,  1.25it/s]

timestamp: 21/06/2020 23:05:21, average loss: 2.386307, time duration: 83.204655,
                            number of examples in current reporting: 800, step 32800
                            out of total 10000000


Iteration:   8%|▊         | 836/10688 [11:44<2:06:12,  1.30it/s]

timestamp: 21/06/2020 23:06:43, average loss: 2.283154, time duration: 82.634136,
                            number of examples in current reporting: 800, step 32900
                            out of total 10000000


Iteration:   9%|▊         | 935/10688 [13:06<2:20:57,  1.15it/s]

timestamp: 21/06/2020 23:08:07, average loss: 2.383008, time duration: 83.423925,
                            number of examples in current reporting: 800, step 33000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  10%|▉         | 1036/10688 [14:33<2:08:32,  1.25it/s]

timestamp: 21/06/2020 23:09:33, average loss: 2.354399, time duration: 86.482815,
                            number of examples in current reporting: 800, step 33100
                            out of total 10000000


Iteration:  11%|█         | 1136/10688 [15:57<2:13:26,  1.19it/s]

timestamp: 21/06/2020 23:10:57, average loss: 2.371758, time duration: 83.456226,
                            number of examples in current reporting: 800, step 33200
                            out of total 10000000


Iteration:  12%|█▏        | 1236/10688 [17:20<2:14:03,  1.18it/s]

timestamp: 21/06/2020 23:12:20, average loss: 2.358568, time duration: 83.251960,
                            number of examples in current reporting: 800, step 33300
                            out of total 10000000


Iteration:  12%|█▎        | 1336/10688 [18:44<2:12:58,  1.17it/s]

timestamp: 21/06/2020 23:13:44, average loss: 2.347184, time duration: 84.209346,
                            number of examples in current reporting: 800, step 33400
                            out of total 10000000


Iteration:  13%|█▎        | 1436/10688 [20:09<2:08:16,  1.20it/s]

timestamp: 21/06/2020 23:15:09, average loss: 2.315324, time duration: 84.391068,
                            number of examples in current reporting: 800, step 33500
                            out of total 10000000


Iteration:  14%|█▍        | 1536/10688 [21:33<2:12:21,  1.15it/s]

timestamp: 21/06/2020 23:16:33, average loss: 2.337495, time duration: 84.077943,
                            number of examples in current reporting: 800, step 33600
                            out of total 10000000


Iteration:  15%|█▌        | 1636/10688 [22:56<2:05:55,  1.20it/s]

timestamp: 21/06/2020 23:17:56, average loss: 2.287019, time duration: 83.423697,
                            number of examples in current reporting: 800, step 33700
                            out of total 10000000


Iteration:  16%|█▌        | 1736/10688 [24:20<2:05:52,  1.19it/s]

timestamp: 21/06/2020 23:19:20, average loss: 2.318268, time duration: 83.420710,
                            number of examples in current reporting: 800, step 33800
                            out of total 10000000


Iteration:  17%|█▋        | 1836/10688 [25:44<2:09:01,  1.14it/s]

timestamp: 21/06/2020 23:20:44, average loss: 2.363392, time duration: 84.310149,
                            number of examples in current reporting: 800, step 33900
                            out of total 10000000


Iteration:  18%|█▊        | 1935/10688 [27:07<2:04:58,  1.17it/s]

timestamp: 21/06/2020 23:22:07, average loss: 2.331559, time duration: 83.445573,
                            number of examples in current reporting: 800, step 34000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  19%|█▉        | 2036/10688 [28:33<1:52:58,  1.28it/s]

timestamp: 21/06/2020 23:23:33, average loss: 2.312310, time duration: 85.642166,
                            number of examples in current reporting: 800, step 34100
                            out of total 10000000


Iteration:  20%|█▉        | 2136/10688 [29:59<2:02:15,  1.17it/s]

timestamp: 21/06/2020 23:24:59, average loss: 2.382260, time duration: 85.633188,
                            number of examples in current reporting: 800, step 34200
                            out of total 10000000


Iteration:  21%|██        | 2236/10688 [31:22<1:54:20,  1.23it/s]

timestamp: 21/06/2020 23:26:22, average loss: 2.339833, time duration: 83.608186,
                            number of examples in current reporting: 800, step 34300
                            out of total 10000000


Iteration:  22%|██▏       | 2336/10688 [32:47<2:05:09,  1.11it/s]

timestamp: 21/06/2020 23:27:47, average loss: 2.323160, time duration: 84.293330,
                            number of examples in current reporting: 800, step 34400
                            out of total 10000000


Iteration:  23%|██▎       | 2436/10688 [34:11<1:55:41,  1.19it/s]

timestamp: 21/06/2020 23:29:10, average loss: 2.310408, time duration: 83.898089,
                            number of examples in current reporting: 800, step 34500
                            out of total 10000000


Iteration:  24%|██▎       | 2536/10688 [35:34<1:41:42,  1.34it/s]

timestamp: 21/06/2020 23:30:33, average loss: 2.350141, time duration: 83.018173,
                            number of examples in current reporting: 800, step 34600
                            out of total 10000000


Iteration:  25%|██▍       | 2636/10688 [36:58<1:54:47,  1.17it/s]

timestamp: 21/06/2020 23:31:58, average loss: 2.291964, time duration: 84.330747,
                            number of examples in current reporting: 800, step 34700
                            out of total 10000000


Iteration:  26%|██▌       | 2736/10688 [38:22<1:56:48,  1.13it/s]

timestamp: 21/06/2020 23:33:22, average loss: 2.384277, time duration: 84.412579,
                            number of examples in current reporting: 800, step 34800
                            out of total 10000000


Iteration:  27%|██▋       | 2836/10688 [39:46<1:53:01,  1.16it/s]

timestamp: 21/06/2020 23:34:46, average loss: 2.335478, time duration: 84.177699,
                            number of examples in current reporting: 800, step 34900
                            out of total 10000000


Iteration:  27%|██▋       | 2935/10688 [41:09<1:51:29,  1.16it/s]

timestamp: 21/06/2020 23:36:10, average loss: 2.362918, time duration: 83.568298,
                            number of examples in current reporting: 800, step 35000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  28%|██▊       | 3036/10688 [42:38<1:45:44,  1.21it/s]

timestamp: 21/06/2020 23:37:38, average loss: 2.378614, time duration: 88.049507,
                            number of examples in current reporting: 800, step 35100
                            out of total 10000000


Iteration:  29%|██▉       | 3136/10688 [44:02<1:48:42,  1.16it/s]

timestamp: 21/06/2020 23:39:02, average loss: 2.253930, time duration: 84.331659,
                            number of examples in current reporting: 800, step 35200
                            out of total 10000000


Iteration:  30%|███       | 3236/10688 [45:26<1:37:54,  1.27it/s]

timestamp: 21/06/2020 23:40:26, average loss: 2.306042, time duration: 83.919618,
                            number of examples in current reporting: 800, step 35300
                            out of total 10000000


Iteration:  31%|███       | 3336/10688 [46:51<1:37:45,  1.25it/s]

timestamp: 21/06/2020 23:41:51, average loss: 2.306435, time duration: 84.433367,
                            number of examples in current reporting: 800, step 35400
                            out of total 10000000


Iteration:  32%|███▏      | 3436/10688 [48:15<1:39:24,  1.22it/s]

timestamp: 21/06/2020 23:43:15, average loss: 2.318024, time duration: 84.596641,
                            number of examples in current reporting: 800, step 35500
                            out of total 10000000


Iteration:  33%|███▎      | 3536/10688 [49:41<1:38:06,  1.21it/s]

timestamp: 21/06/2020 23:44:41, average loss: 2.354188, time duration: 85.256025,
                            number of examples in current reporting: 800, step 35600
                            out of total 10000000


Iteration:  34%|███▍      | 3636/10688 [51:04<1:31:19,  1.29it/s]

timestamp: 21/06/2020 23:46:04, average loss: 2.322191, time duration: 83.339225,
                            number of examples in current reporting: 800, step 35700
                            out of total 10000000


Iteration:  35%|███▍      | 3736/10688 [52:31<1:44:26,  1.11it/s]

timestamp: 21/06/2020 23:47:31, average loss: 2.396730, time duration: 86.715295,
                            number of examples in current reporting: 800, step 35800
                            out of total 10000000


Iteration:  36%|███▌      | 3836/10688 [53:54<1:36:18,  1.19it/s]

timestamp: 21/06/2020 23:48:54, average loss: 2.285822, time duration: 83.537993,
                            number of examples in current reporting: 800, step 35900
                            out of total 10000000


Iteration:  37%|███▋      | 3935/10688 [55:17<1:32:24,  1.22it/s]

timestamp: 21/06/2020 23:50:18, average loss: 2.338153, time duration: 83.884833,
                            number of examples in current reporting: 800, step 36000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  38%|███▊      | 4036/10688 [56:47<1:37:31,  1.14it/s]

timestamp: 21/06/2020 23:51:47, average loss: 2.340182, time duration: 88.468064,
                            number of examples in current reporting: 800, step 36100
                            out of total 10000000


Iteration:  39%|███▊      | 4136/10688 [58:10<1:28:48,  1.23it/s]

timestamp: 21/06/2020 23:53:10, average loss: 2.334949, time duration: 83.203947,
                            number of examples in current reporting: 800, step 36200
                            out of total 10000000


Iteration:  40%|███▉      | 4236/10688 [59:33<1:30:29,  1.19it/s]

timestamp: 21/06/2020 23:54:33, average loss: 2.342579, time duration: 83.601800,
                            number of examples in current reporting: 800, step 36300
                            out of total 10000000


Iteration:  41%|████      | 4336/10688 [1:00:58<1:35:00,  1.11it/s]

timestamp: 21/06/2020 23:55:58, average loss: 2.346722, time duration: 84.290972,
                            number of examples in current reporting: 800, step 36400
                            out of total 10000000


Iteration:  42%|████▏     | 4436/10688 [1:02:21<1:26:37,  1.20it/s]

timestamp: 21/06/2020 23:57:21, average loss: 2.295624, time duration: 83.398814,
                            number of examples in current reporting: 800, step 36500
                            out of total 10000000


Iteration:  42%|████▏     | 4536/10688 [1:03:45<1:28:15,  1.16it/s]

timestamp: 21/06/2020 23:58:45, average loss: 2.252895, time duration: 84.183463,
                            number of examples in current reporting: 800, step 36600
                            out of total 10000000


Iteration:  43%|████▎     | 4636/10688 [1:05:09<1:19:18,  1.27it/s]

timestamp: 22/06/2020 00:00:09, average loss: 2.304598, time duration: 83.712342,
                            number of examples in current reporting: 800, step 36700
                            out of total 10000000


Iteration:  44%|████▍     | 4736/10688 [1:06:33<1:25:20,  1.16it/s]

timestamp: 22/06/2020 00:01:33, average loss: 2.358064, time duration: 84.437785,
                            number of examples in current reporting: 800, step 36800
                            out of total 10000000


Iteration:  45%|████▌     | 4836/10688 [1:07:57<1:19:07,  1.23it/s]

timestamp: 22/06/2020 00:02:56, average loss: 2.283537, time duration: 83.157912,
                            number of examples in current reporting: 800, step 36900
                            out of total 10000000


Iteration:  46%|████▌     | 4935/10688 [1:09:20<1:17:44,  1.23it/s]

timestamp: 22/06/2020 00:04:21, average loss: 2.298702, time duration: 84.518381,
                            number of examples in current reporting: 800, step 37000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  47%|████▋     | 5036/10688 [1:10:48<1:17:23,  1.22it/s]

timestamp: 22/06/2020 00:05:48, average loss: 2.364263, time duration: 86.876688,
                            number of examples in current reporting: 800, step 37100
                            out of total 10000000


Iteration:  48%|████▊     | 5136/10688 [1:12:12<1:16:22,  1.21it/s]

timestamp: 22/06/2020 00:07:12, average loss: 2.333580, time duration: 84.352861,
                            number of examples in current reporting: 800, step 37200
                            out of total 10000000


Iteration:  49%|████▉     | 5236/10688 [1:13:36<1:12:24,  1.25it/s]

timestamp: 22/06/2020 00:08:36, average loss: 2.342640, time duration: 83.611840,
                            number of examples in current reporting: 800, step 37300
                            out of total 10000000


Iteration:  50%|████▉     | 5336/10688 [1:14:59<1:13:10,  1.22it/s]

timestamp: 22/06/2020 00:09:59, average loss: 2.307882, time duration: 83.469604,
                            number of examples in current reporting: 800, step 37400
                            out of total 10000000


Iteration:  51%|█████     | 5436/10688 [1:16:25<1:11:44,  1.22it/s]

timestamp: 22/06/2020 00:11:25, average loss: 2.366714, time duration: 85.451819,
                            number of examples in current reporting: 800, step 37500
                            out of total 10000000


Iteration:  52%|█████▏    | 5536/10688 [1:17:49<1:11:56,  1.19it/s]

timestamp: 22/06/2020 00:12:49, average loss: 2.318595, time duration: 84.325682,
                            number of examples in current reporting: 800, step 37600
                            out of total 10000000


Iteration:  53%|█████▎    | 5636/10688 [1:19:12<1:09:59,  1.20it/s]

timestamp: 22/06/2020 00:14:12, average loss: 2.263720, time duration: 82.640790,
                            number of examples in current reporting: 800, step 37700
                            out of total 10000000


Iteration:  54%|█████▎    | 5736/10688 [1:20:36<1:14:31,  1.11it/s]

timestamp: 22/06/2020 00:15:36, average loss: 2.305715, time duration: 84.179815,
                            number of examples in current reporting: 800, step 37800
                            out of total 10000000


Iteration:  55%|█████▍    | 5836/10688 [1:22:01<1:09:43,  1.16it/s]

timestamp: 22/06/2020 00:17:00, average loss: 2.387094, time duration: 84.514506,
                            number of examples in current reporting: 800, step 37900
                            out of total 10000000


Iteration:  56%|█████▌    | 5935/10688 [1:23:23<1:05:44,  1.21it/s]

timestamp: 22/06/2020 00:18:24, average loss: 2.312015, time duration: 83.398030,
                            number of examples in current reporting: 800, step 38000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  56%|█████▋    | 6036/10688 [1:24:52<1:05:09,  1.19it/s]

timestamp: 22/06/2020 00:19:52, average loss: 2.319616, time duration: 88.339958,
                            number of examples in current reporting: 800, step 38100
                            out of total 10000000


Iteration:  57%|█████▋    | 6136/10688 [1:26:17<1:04:19,  1.18it/s]

timestamp: 22/06/2020 00:21:17, average loss: 2.272500, time duration: 84.849372,
                            number of examples in current reporting: 800, step 38200
                            out of total 10000000


Iteration:  58%|█████▊    | 6236/10688 [1:27:43<1:08:06,  1.09it/s]

timestamp: 22/06/2020 00:22:43, average loss: 2.320851, time duration: 85.631724,
                            number of examples in current reporting: 800, step 38300
                            out of total 10000000


Iteration:  59%|█████▉    | 6336/10688 [1:29:06<56:51,  1.28it/s]  

timestamp: 22/06/2020 00:24:06, average loss: 2.303614, time duration: 83.211449,
                            number of examples in current reporting: 800, step 38400
                            out of total 10000000


Iteration:  60%|██████    | 6436/10688 [1:30:30<58:28,  1.21it/s]  

timestamp: 22/06/2020 00:25:29, average loss: 2.303559, time duration: 83.592402,
                            number of examples in current reporting: 800, step 38500
                            out of total 10000000


Iteration:  61%|██████    | 6536/10688 [1:31:54<56:50,  1.22it/s]  

timestamp: 22/06/2020 00:26:54, average loss: 2.330216, time duration: 84.292786,
                            number of examples in current reporting: 800, step 38600
                            out of total 10000000


Iteration:  62%|██████▏   | 6636/10688 [1:33:18<59:33,  1.13it/s]  

timestamp: 22/06/2020 00:28:18, average loss: 2.353792, time duration: 84.008525,
                            number of examples in current reporting: 800, step 38700
                            out of total 10000000


Iteration:  63%|██████▎   | 6736/10688 [1:34:41<52:43,  1.25it/s]  

timestamp: 22/06/2020 00:29:41, average loss: 2.352834, time duration: 83.610034,
                            number of examples in current reporting: 800, step 38800
                            out of total 10000000


Iteration:  64%|██████▍   | 6836/10688 [1:36:06<52:12,  1.23it/s]  

timestamp: 22/06/2020 00:31:06, average loss: 2.309672, time duration: 84.245239,
                            number of examples in current reporting: 800, step 38900
                            out of total 10000000


Iteration:  65%|██████▍   | 6935/10688 [1:37:31<57:49,  1.08it/s]  

timestamp: 22/06/2020 00:32:32, average loss: 2.299967, time duration: 85.958566,
                            number of examples in current reporting: 800, step 39000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  66%|██████▌   | 7036/10688 [1:39:01<49:18,  1.23it/s]  

timestamp: 22/06/2020 00:34:00, average loss: 2.336604, time duration: 88.913298,
                            number of examples in current reporting: 800, step 39100
                            out of total 10000000


Iteration:  67%|██████▋   | 7136/10688 [1:40:26<52:39,  1.12it/s]

timestamp: 22/06/2020 00:35:26, average loss: 2.342584, time duration: 85.789818,
                            number of examples in current reporting: 800, step 39200
                            out of total 10000000


Iteration:  68%|██████▊   | 7236/10688 [1:41:52<47:16,  1.22it/s]

timestamp: 22/06/2020 00:36:52, average loss: 2.308331, time duration: 85.395571,
                            number of examples in current reporting: 800, step 39300
                            out of total 10000000


Iteration:  69%|██████▊   | 7336/10688 [1:43:18<50:53,  1.10it/s]

timestamp: 22/06/2020 00:38:18, average loss: 2.342120, time duration: 86.059627,
                            number of examples in current reporting: 800, step 39400
                            out of total 10000000


Iteration:  70%|██████▉   | 7436/10688 [1:44:44<46:02,  1.18it/s]

timestamp: 22/06/2020 00:39:44, average loss: 2.295734, time duration: 86.555046,
                            number of examples in current reporting: 800, step 39500
                            out of total 10000000


Iteration:  71%|███████   | 7536/10688 [1:46:11<47:32,  1.10it/s]

timestamp: 22/06/2020 00:41:11, average loss: 2.333450, time duration: 86.336341,
                            number of examples in current reporting: 800, step 39600
                            out of total 10000000


Iteration:  71%|███████▏  | 7636/10688 [1:47:37<43:08,  1.18it/s]

timestamp: 22/06/2020 00:42:37, average loss: 2.313056, time duration: 86.350742,
                            number of examples in current reporting: 800, step 39700
                            out of total 10000000


Iteration:  72%|███████▏  | 7736/10688 [1:49:01<39:12,  1.25it/s]

timestamp: 22/06/2020 00:44:01, average loss: 2.255406, time duration: 83.896225,
                            number of examples in current reporting: 800, step 39800
                            out of total 10000000


Iteration:  73%|███████▎  | 7836/10688 [1:50:25<41:07,  1.16it/s]

timestamp: 22/06/2020 00:45:25, average loss: 2.262218, time duration: 84.232606,
                            number of examples in current reporting: 800, step 39900
                            out of total 10000000


Iteration:  74%|███████▍  | 7935/10688 [1:51:48<37:05,  1.24it/s]

timestamp: 22/06/2020 00:46:49, average loss: 2.302838, time duration: 83.945617,
                            number of examples in current reporting: 800, step 40000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  75%|███████▌  | 8036/10688 [1:53:17<39:36,  1.12it/s]  

timestamp: 22/06/2020 00:48:17, average loss: 2.302205, time duration: 87.674274,
                            number of examples in current reporting: 800, step 40100
                            out of total 10000000


Iteration:  76%|███████▌  | 8136/10688 [1:54:41<36:20,  1.17it/s]

timestamp: 22/06/2020 00:49:41, average loss: 2.321422, time duration: 84.040834,
                            number of examples in current reporting: 800, step 40200
                            out of total 10000000


Iteration:  77%|███████▋  | 8236/10688 [1:56:08<35:23,  1.15it/s]

timestamp: 22/06/2020 00:51:08, average loss: 2.318630, time duration: 87.201530,
                            number of examples in current reporting: 800, step 40300
                            out of total 10000000


Iteration:  78%|███████▊  | 8336/10688 [1:57:37<35:54,  1.09it/s]

timestamp: 22/06/2020 00:52:36, average loss: 2.297263, time duration: 88.502790,
                            number of examples in current reporting: 800, step 40400
                            out of total 10000000


Iteration:  79%|███████▉  | 8436/10688 [1:59:06<34:09,  1.10it/s]

timestamp: 22/06/2020 00:54:06, average loss: 2.316969, time duration: 89.061567,
                            number of examples in current reporting: 800, step 40500
                            out of total 10000000


Iteration:  80%|███████▉  | 8536/10688 [2:00:33<30:36,  1.17it/s]

timestamp: 22/06/2020 00:55:33, average loss: 2.293547, time duration: 87.872432,
                            number of examples in current reporting: 800, step 40600
                            out of total 10000000


Iteration:  81%|████████  | 8636/10688 [2:01:59<27:24,  1.25it/s]

timestamp: 22/06/2020 00:56:59, average loss: 2.329330, time duration: 85.416920,
                            number of examples in current reporting: 800, step 40700
                            out of total 10000000


Iteration:  82%|████████▏ | 8736/10688 [2:03:22<28:29,  1.14it/s]

timestamp: 22/06/2020 00:58:22, average loss: 2.265644, time duration: 83.355553,
                            number of examples in current reporting: 800, step 40800
                            out of total 10000000


Iteration:  83%|████████▎ | 8836/10688 [2:04:46<24:44,  1.25it/s]

timestamp: 22/06/2020 00:59:46, average loss: 2.357124, time duration: 83.880306,
                            number of examples in current reporting: 800, step 40900
                            out of total 10000000


Iteration:  84%|████████▎ | 8935/10688 [2:06:10<25:35,  1.14it/s]

timestamp: 22/06/2020 01:01:11, average loss: 2.319015, time duration: 84.703009,
                            number of examples in current reporting: 800, step 41000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  85%|████████▍ | 9036/10688 [2:07:39<22:44,  1.21it/s]

timestamp: 22/06/2020 01:02:39, average loss: 2.270117, time duration: 87.927260,
                            number of examples in current reporting: 800, step 41100
                            out of total 10000000


Iteration:  85%|████████▌ | 9136/10688 [2:09:03<20:57,  1.23it/s]

timestamp: 22/06/2020 01:04:03, average loss: 2.259966, time duration: 84.349773,
                            number of examples in current reporting: 800, step 41200
                            out of total 10000000


Iteration:  86%|████████▋ | 9236/10688 [2:10:29<21:23,  1.13it/s]

timestamp: 22/06/2020 01:05:29, average loss: 2.261295, time duration: 86.018192,
                            number of examples in current reporting: 800, step 41300
                            out of total 10000000


Iteration:  87%|████████▋ | 9336/10688 [2:11:54<18:28,  1.22it/s]

timestamp: 22/06/2020 01:06:54, average loss: 2.336667, time duration: 84.663954,
                            number of examples in current reporting: 800, step 41400
                            out of total 10000000


Iteration:  88%|████████▊ | 9436/10688 [2:13:20<17:28,  1.19it/s]

timestamp: 22/06/2020 01:08:19, average loss: 2.351904, time duration: 85.737966,
                            number of examples in current reporting: 800, step 41500
                            out of total 10000000


Iteration:  89%|████████▉ | 9536/10688 [2:14:44<16:39,  1.15it/s]

timestamp: 22/06/2020 01:09:44, average loss: 2.302978, time duration: 84.635662,
                            number of examples in current reporting: 800, step 41600
                            out of total 10000000


Iteration:  90%|█████████ | 9636/10688 [2:16:10<15:07,  1.16it/s]

timestamp: 22/06/2020 01:11:10, average loss: 2.361027, time duration: 85.977451,
                            number of examples in current reporting: 800, step 41700
                            out of total 10000000


Iteration:  91%|█████████ | 9736/10688 [2:17:35<14:01,  1.13it/s]

timestamp: 22/06/2020 01:12:35, average loss: 2.332284, time duration: 84.980008,
                            number of examples in current reporting: 800, step 41800
                            out of total 10000000


Iteration:  92%|█████████▏| 9836/10688 [2:19:01<11:52,  1.20it/s]

timestamp: 22/06/2020 01:14:01, average loss: 2.319427, time duration: 86.108116,
                            number of examples in current reporting: 800, step 41900
                            out of total 10000000


Iteration:  93%|█████████▎| 9935/10688 [2:20:25<09:54,  1.27it/s]

timestamp: 22/06/2020 01:15:26, average loss: 2.246608, time duration: 84.661574,
                            number of examples in current reporting: 800, step 42000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  94%|█████████▍| 10036/10688 [2:21:54<09:15,  1.17it/s]

timestamp: 22/06/2020 01:16:54, average loss: 2.303847, time duration: 88.542682,
                            number of examples in current reporting: 800, step 42100
                            out of total 10000000


Iteration:  95%|█████████▍| 10136/10688 [2:23:20<08:28,  1.09it/s]

timestamp: 22/06/2020 01:18:20, average loss: 2.305515, time duration: 85.414719,
                            number of examples in current reporting: 800, step 42200
                            out of total 10000000


Iteration:  96%|█████████▌| 10236/10688 [2:24:46<06:33,  1.15it/s]

timestamp: 22/06/2020 01:19:46, average loss: 2.296599, time duration: 85.853866,
                            number of examples in current reporting: 800, step 42300
                            out of total 10000000


Iteration:  97%|█████████▋| 10336/10688 [2:26:10<04:39,  1.26it/s]

timestamp: 22/06/2020 01:21:10, average loss: 2.270323, time duration: 84.186280,
                            number of examples in current reporting: 800, step 42400
                            out of total 10000000


Iteration:  98%|█████████▊| 10436/10688 [2:27:35<03:41,  1.14it/s]

timestamp: 22/06/2020 01:22:35, average loss: 2.310229, time duration: 85.352051,
                            number of examples in current reporting: 800, step 42500
                            out of total 10000000


Iteration:  99%|█████████▊| 10536/10688 [2:28:59<02:02,  1.24it/s]

timestamp: 22/06/2020 01:23:59, average loss: 2.237330, time duration: 83.641792,
                            number of examples in current reporting: 800, step 42600
                            out of total 10000000


Iteration: 100%|█████████▉| 10636/10688 [2:30:23<00:43,  1.19it/s]

timestamp: 22/06/2020 01:25:23, average loss: 2.296950, time duration: 84.097801,
                            number of examples in current reporting: 800, step 42700
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:07<00:00,  1.18it/s]
Iteration:   0%|          | 48/10688 [00:43<2:49:10,  1.05it/s]

timestamp: 22/06/2020 01:26:50, average loss: 2.243336, time duration: 86.899170,
                            number of examples in current reporting: 796, step 42800
                            out of total 10000000


Iteration:   1%|▏         | 148/10688 [02:09<2:26:46,  1.20it/s]

timestamp: 22/06/2020 01:28:16, average loss: 2.202838, time duration: 86.253572,
                            number of examples in current reporting: 800, step 42900
                            out of total 10000000


Iteration:   2%|▏         | 247/10688 [03:33<2:35:08,  1.12it/s]

timestamp: 22/06/2020 01:29:41, average loss: 2.236073, time duration: 84.591691,
                            number of examples in current reporting: 800, step 43000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:   3%|▎         | 348/10688 [05:01<2:19:48,  1.23it/s]

timestamp: 22/06/2020 01:31:08, average loss: 2.229033, time duration: 87.447171,
                            number of examples in current reporting: 800, step 43100
                            out of total 10000000


Iteration:   4%|▍         | 448/10688 [06:25<2:32:16,  1.12it/s]

timestamp: 22/06/2020 01:32:32, average loss: 2.228117, time duration: 83.821301,
                            number of examples in current reporting: 800, step 43200
                            out of total 10000000


Iteration:   5%|▌         | 548/10688 [07:48<2:19:20,  1.21it/s]

timestamp: 22/06/2020 01:33:55, average loss: 2.211125, time duration: 82.756601,
                            number of examples in current reporting: 800, step 43300
                            out of total 10000000


Iteration:   6%|▌         | 648/10688 [09:11<2:13:25,  1.25it/s]

timestamp: 22/06/2020 01:35:18, average loss: 2.203174, time duration: 82.961278,
                            number of examples in current reporting: 800, step 43400
                            out of total 10000000


Iteration:   8%|▊         | 848/10688 [11:59<2:29:28,  1.10it/s]

timestamp: 22/06/2020 01:38:06, average loss: 2.203531, time duration: 83.982231,
                            number of examples in current reporting: 800, step 43600
                            out of total 10000000


Iteration:   9%|▉         | 948/10688 [13:23<2:16:16,  1.19it/s]

timestamp: 22/06/2020 01:39:30, average loss: 2.197373, time duration: 83.295434,
                            number of examples in current reporting: 800, step 43700
                            out of total 10000000


Iteration:  10%|▉         | 1048/10688 [14:47<2:13:20,  1.20it/s]

timestamp: 22/06/2020 01:40:55, average loss: 2.304416, time duration: 84.740831,
                            number of examples in current reporting: 800, step 43800
                            out of total 10000000


Iteration:  11%|█         | 1148/10688 [16:11<2:09:54,  1.22it/s]

timestamp: 22/06/2020 01:42:18, average loss: 2.170259, time duration: 83.895816,
                            number of examples in current reporting: 800, step 43900
                            out of total 10000000


Iteration:  12%|█▏        | 1247/10688 [17:37<2:11:57,  1.19it/s]

timestamp: 22/06/2020 01:43:45, average loss: 2.246134, time duration: 86.263262,
                            number of examples in current reporting: 800, step 44000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  13%|█▎        | 1348/10688 [19:08<2:31:28,  1.03it/s]

timestamp: 22/06/2020 01:45:15, average loss: 2.224515, time duration: 90.355286,
                            number of examples in current reporting: 800, step 44100
                            out of total 10000000


Iteration:  14%|█▎        | 1448/10688 [20:33<2:07:03,  1.21it/s]

timestamp: 22/06/2020 01:46:40, average loss: 2.140899, time duration: 85.440123,
                            number of examples in current reporting: 800, step 44200
                            out of total 10000000


Iteration:  14%|█▍        | 1548/10688 [21:59<2:12:23,  1.15it/s]

timestamp: 22/06/2020 01:48:06, average loss: 2.181431, time duration: 85.590627,
                            number of examples in current reporting: 800, step 44300
                            out of total 10000000


Iteration:  15%|█▌        | 1648/10688 [23:25<2:11:38,  1.14it/s]

timestamp: 22/06/2020 01:49:32, average loss: 2.232002, time duration: 85.754274,
                            number of examples in current reporting: 800, step 44400
                            out of total 10000000


Iteration:  16%|█▋        | 1748/10688 [24:49<1:59:29,  1.25it/s]

timestamp: 22/06/2020 01:50:56, average loss: 2.205156, time duration: 83.716357,
                            number of examples in current reporting: 800, step 44500
                            out of total 10000000


Iteration:  17%|█▋        | 1848/10688 [26:12<1:56:43,  1.26it/s]

timestamp: 22/06/2020 01:52:19, average loss: 2.187624, time duration: 83.786623,
                            number of examples in current reporting: 800, step 44600
                            out of total 10000000


Iteration:  18%|█▊        | 1948/10688 [27:37<2:10:50,  1.11it/s]

timestamp: 22/06/2020 01:53:44, average loss: 2.274879, time duration: 84.814944,
                            number of examples in current reporting: 800, step 44700
                            out of total 10000000


Iteration:  19%|█▉        | 2048/10688 [29:00<2:01:59,  1.18it/s]

timestamp: 22/06/2020 01:55:08, average loss: 2.239564, time duration: 83.376859,
                            number of examples in current reporting: 800, step 44800
                            out of total 10000000


Iteration:  20%|██        | 2148/10688 [30:24<2:00:32,  1.18it/s]

timestamp: 22/06/2020 01:56:31, average loss: 2.201958, time duration: 83.606603,
                            number of examples in current reporting: 800, step 44900
                            out of total 10000000


Iteration:  21%|██        | 2247/10688 [31:48<2:04:32,  1.13it/s]

timestamp: 22/06/2020 01:57:56, average loss: 2.226119, time duration: 84.587406,
                            number of examples in current reporting: 800, step 45000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  22%|██▏       | 2348/10688 [33:16<1:51:10,  1.25it/s]

timestamp: 22/06/2020 01:59:23, average loss: 2.282918, time duration: 87.666717,
                            number of examples in current reporting: 800, step 45100
                            out of total 10000000


Iteration:  23%|██▎       | 2448/10688 [34:40<1:51:21,  1.23it/s]

timestamp: 22/06/2020 02:00:47, average loss: 2.185142, time duration: 83.675094,
                            number of examples in current reporting: 800, step 45200
                            out of total 10000000


Iteration:  24%|██▍       | 2548/10688 [36:06<1:59:05,  1.14it/s]

timestamp: 22/06/2020 02:02:13, average loss: 2.175784, time duration: 86.390477,
                            number of examples in current reporting: 800, step 45300
                            out of total 10000000


Iteration:  25%|██▍       | 2648/10688 [37:33<2:04:15,  1.08it/s]

timestamp: 22/06/2020 02:03:40, average loss: 2.228802, time duration: 86.920397,
                            number of examples in current reporting: 800, step 45400
                            out of total 10000000


Iteration:  26%|██▌       | 2748/10688 [39:02<1:58:34,  1.12it/s]

timestamp: 22/06/2020 02:05:09, average loss: 2.248930, time duration: 88.821792,
                            number of examples in current reporting: 800, step 45500
                            out of total 10000000


Iteration:  27%|██▋       | 2848/10688 [40:31<1:48:54,  1.20it/s]

timestamp: 22/06/2020 02:06:38, average loss: 2.244242, time duration: 88.962336,
                            number of examples in current reporting: 800, step 45600
                            out of total 10000000


Iteration:  28%|██▊       | 2948/10688 [42:01<1:57:32,  1.10it/s]

timestamp: 22/06/2020 02:08:08, average loss: 2.245901, time duration: 89.509425,
                            number of examples in current reporting: 800, step 45700
                            out of total 10000000


Iteration:  29%|██▊       | 3048/10688 [43:26<1:42:57,  1.24it/s]

timestamp: 22/06/2020 02:09:33, average loss: 2.160661, time duration: 85.241620,
                            number of examples in current reporting: 800, step 45800
                            out of total 10000000


Iteration:  29%|██▉       | 3148/10688 [44:50<1:49:11,  1.15it/s]

timestamp: 22/06/2020 02:10:57, average loss: 2.258503, time duration: 84.340068,
                            number of examples in current reporting: 800, step 45900
                            out of total 10000000


Iteration:  30%|███       | 3247/10688 [46:12<1:45:02,  1.18it/s]

timestamp: 22/06/2020 02:12:20, average loss: 2.196404, time duration: 82.952429,
                            number of examples in current reporting: 800, step 46000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  31%|███▏      | 3348/10688 [47:45<1:50:14,  1.11it/s]

timestamp: 22/06/2020 02:13:52, average loss: 2.188280, time duration: 91.569305,
                            number of examples in current reporting: 800, step 46100
                            out of total 10000000


Iteration:  32%|███▏      | 3448/10688 [49:14<1:47:35,  1.12it/s]

timestamp: 22/06/2020 02:15:21, average loss: 2.257805, time duration: 89.425404,
                            number of examples in current reporting: 800, step 46200
                            out of total 10000000


Iteration:  33%|███▎      | 3548/10688 [50:39<1:40:32,  1.18it/s]

timestamp: 22/06/2020 02:16:46, average loss: 2.179342, time duration: 84.784894,
                            number of examples in current reporting: 800, step 46300
                            out of total 10000000


Iteration:  34%|███▍      | 3648/10688 [52:03<1:35:37,  1.23it/s]

timestamp: 22/06/2020 02:18:10, average loss: 2.195604, time duration: 83.723634,
                            number of examples in current reporting: 800, step 46400
                            out of total 10000000


Iteration:  35%|███▌      | 3748/10688 [53:26<1:36:24,  1.20it/s]

timestamp: 22/06/2020 02:19:33, average loss: 2.235675, time duration: 83.636162,
                            number of examples in current reporting: 800, step 46500
                            out of total 10000000


Iteration:  35%|███▌      | 3786/10688 [53:58<1:47:58,  1.07it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Iteration:  75%|███████▌  | 8048/10688 [1:53:59<35:01,  1.26it/s]

timestamp: 22/06/2020 03:20:06, average loss: 2.083762, time duration: 84.341743,
                            number of examples in current reporting: 800, step 50800
                            out of total 10000000


Iteration:  76%|███████▌  | 8148/10688 [1:55:24<37:11,  1.14it/s]

timestamp: 22/06/2020 03:21:31, average loss: 2.202515, time duration: 85.628312,
                            number of examples in current reporting: 800, step 50900
                            out of total 10000000


Iteration:  77%|███████▋  | 8247/10688 [1:56:48<34:09,  1.19it/s]

timestamp: 22/06/2020 03:22:56, average loss: 2.183090, time duration: 84.928757,
                            number of examples in current reporting: 800, step 51000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  78%|███████▊  | 8348/10688 [1:58:16<32:59,  1.18it/s]  

timestamp: 22/06/2020 03:24:23, average loss: 2.183712, time duration: 87.375290,
                            number of examples in current reporting: 800, step 51100
                            out of total 10000000


Iteration:  79%|███████▉  | 8448/10688 [1:59:40<29:06,  1.28it/s]

timestamp: 22/06/2020 03:25:47, average loss: 2.190938, time duration: 83.594820,
                            number of examples in current reporting: 800, step 51200
                            out of total 10000000


Iteration:  80%|███████▉  | 8548/10688 [2:01:04<27:09,  1.31it/s]

timestamp: 22/06/2020 03:27:11, average loss: 2.199099, time duration: 84.041721,
                            number of examples in current reporting: 800, step 51300
                            out of total 10000000


Iteration:  81%|████████  | 8648/10688 [2:02:27<29:30,  1.15it/s]

timestamp: 22/06/2020 03:28:34, average loss: 2.181229, time duration: 83.154757,
                            number of examples in current reporting: 800, step 51400
                            out of total 10000000


Iteration:  82%|████████▏ | 8748/10688 [2:03:51<26:55,  1.20it/s]

timestamp: 22/06/2020 03:29:58, average loss: 2.128630, time duration: 83.382838,
                            number of examples in current reporting: 800, step 51500
                            out of total 10000000


Iteration:  83%|████████▎ | 8848/10688 [2:05:13<25:14,  1.22it/s]

timestamp: 22/06/2020 03:31:20, average loss: 2.210244, time duration: 82.745538,
                            number of examples in current reporting: 800, step 51600
                            out of total 10000000


Iteration:  84%|████████▎ | 8948/10688 [2:06:38<25:22,  1.14it/s]

timestamp: 22/06/2020 03:32:45, average loss: 2.238288, time duration: 84.215688,
                            number of examples in current reporting: 800, step 51700
                            out of total 10000000


Iteration:  85%|████████▍ | 9048/10688 [2:08:01<20:51,  1.31it/s]

timestamp: 22/06/2020 03:34:08, average loss: 2.173464, time duration: 83.609825,
                            number of examples in current reporting: 800, step 51800
                            out of total 10000000


Iteration:  86%|████████▌ | 9148/10688 [2:09:26<22:20,  1.15it/s]

timestamp: 22/06/2020 03:35:33, average loss: 2.163898, time duration: 84.685190,
                            number of examples in current reporting: 800, step 51900
                            out of total 10000000


Iteration:  87%|████████▋ | 9247/10688 [2:10:50<20:41,  1.16it/s]

timestamp: 22/06/2020 03:36:58, average loss: 2.211973, time duration: 84.658115,
                            number of examples in current reporting: 800, step 52000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  87%|████████▋ | 9348/10688 [2:12:17<17:37,  1.27it/s]

timestamp: 22/06/2020 03:38:24, average loss: 2.204777, time duration: 86.537672,
                            number of examples in current reporting: 800, step 52100
                            out of total 10000000


Iteration:  88%|████████▊ | 9448/10688 [2:13:40<17:32,  1.18it/s]

timestamp: 22/06/2020 03:39:47, average loss: 2.166408, time duration: 83.350130,
                            number of examples in current reporting: 800, step 52200
                            out of total 10000000


Iteration:  89%|████████▉ | 9548/10688 [2:15:04<15:23,  1.23it/s]

timestamp: 22/06/2020 03:41:11, average loss: 2.241498, time duration: 83.912807,
                            number of examples in current reporting: 800, step 52300
                            out of total 10000000


Iteration:  90%|█████████ | 9648/10688 [2:16:28<14:39,  1.18it/s]

timestamp: 22/06/2020 03:42:35, average loss: 2.152585, time duration: 83.658691,
                            number of examples in current reporting: 800, step 52400
                            out of total 10000000


Iteration:  91%|█████████ | 9748/10688 [2:17:52<12:49,  1.22it/s]

timestamp: 22/06/2020 03:43:59, average loss: 2.191486, time duration: 83.675314,
                            number of examples in current reporting: 800, step 52500
                            out of total 10000000


Iteration:  92%|█████████▏| 9848/10688 [2:19:16<12:06,  1.16it/s]

timestamp: 22/06/2020 03:45:23, average loss: 2.199373, time duration: 83.981499,
                            number of examples in current reporting: 800, step 52600
                            out of total 10000000


Iteration:  93%|█████████▎| 9948/10688 [2:20:40<10:24,  1.18it/s]

timestamp: 22/06/2020 03:46:47, average loss: 2.201174, time duration: 84.153856,
                            number of examples in current reporting: 800, step 52700
                            out of total 10000000


Iteration:  94%|█████████▍| 10048/10688 [2:22:04<08:54,  1.20it/s]

timestamp: 22/06/2020 03:48:11, average loss: 2.204016, time duration: 84.347241,
                            number of examples in current reporting: 800, step 52800
                            out of total 10000000


Iteration:  95%|█████████▍| 10148/10688 [2:23:27<07:20,  1.23it/s]

timestamp: 22/06/2020 03:49:35, average loss: 2.161560, time duration: 83.329085,
                            number of examples in current reporting: 800, step 52900
                            out of total 10000000


Iteration:  96%|█████████▌| 10247/10688 [2:24:49<06:09,  1.19it/s]

timestamp: 22/06/2020 03:50:57, average loss: 2.141003, time duration: 82.535743,
                            number of examples in current reporting: 800, step 53000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  97%|█████████▋| 10348/10688 [2:26:18<04:40,  1.21it/s]

timestamp: 22/06/2020 03:52:25, average loss: 2.203958, time duration: 88.081768,
                            number of examples in current reporting: 800, step 53100
                            out of total 10000000


Iteration:  98%|█████████▊| 10448/10688 [2:27:42<03:19,  1.20it/s]

timestamp: 22/06/2020 03:53:49, average loss: 2.179805, time duration: 84.123743,
                            number of examples in current reporting: 800, step 53200
                            out of total 10000000


Iteration:  99%|█████████▊| 10548/10688 [2:29:07<02:00,  1.17it/s]

timestamp: 22/06/2020 03:55:14, average loss: 2.144654, time duration: 84.318659,
                            number of examples in current reporting: 800, step 53300
                            out of total 10000000


Iteration: 100%|█████████▉| 10648/10688 [2:30:31<00:33,  1.18it/s]

timestamp: 22/06/2020 03:56:38, average loss: 2.179874, time duration: 84.527066,
                            number of examples in current reporting: 800, step 53400
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:04<00:00,  1.18it/s]
Iteration:   1%|          | 60/10688 [00:50<2:35:38,  1.14it/s]

timestamp: 22/06/2020 03:58:02, average loss: 2.091669, time duration: 83.499246,
                            number of examples in current reporting: 796, step 53500
                            out of total 10000000


Iteration:   1%|▏         | 160/10688 [02:15<2:34:50,  1.13it/s]

timestamp: 22/06/2020 03:59:27, average loss: 2.079622, time duration: 84.997754,
                            number of examples in current reporting: 800, step 53600
                            out of total 10000000


Iteration:   2%|▏         | 260/10688 [03:43<2:24:33,  1.20it/s]

timestamp: 22/06/2020 04:00:54, average loss: 2.118323, time duration: 87.595699,
                            number of examples in current reporting: 800, step 53700
                            out of total 10000000


Iteration:   3%|▎         | 295/10688 [04:13<2:23:17,  1.21it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Iteration:  46%|████▋     | 4960/10688 [1:10:17<1:16:42,  1.24it/s]

timestamp: 22/06/2020 05:07:28, average loss: 2.116890, time duration: 84.035245,
                            number of examples in current reporting: 800, step 58400
                            out of total 10000000


Iteration:  47%|████▋     | 5060/10688 [1:11:41<1:20:24,  1.17it/s]

timestamp: 22/06/2020 05:08:52, average loss: 2.181912, time duration: 84.455152,
                            number of examples in current reporting: 800, step 58500
                            out of total 10000000


Iteration:  48%|████▊     | 5160/10688 [1:13:06<1:23:54,  1.10it/s]

timestamp: 22/06/2020 05:10:17, average loss: 2.093297, time duration: 84.620158,
                            number of examples in current reporting: 800, step 58600
                            out of total 10000000


Iteration:  49%|████▉     | 5260/10688 [1:14:30<1:12:38,  1.25it/s]

timestamp: 22/06/2020 05:11:41, average loss: 2.087385, time duration: 83.857832,
                            number of examples in current reporting: 800, step 58700
                            out of total 10000000


Iteration:  50%|█████     | 5360/10688 [1:15:54<1:17:57,  1.14it/s]

timestamp: 22/06/2020 05:13:06, average loss: 2.130576, time duration: 84.808764,
                            number of examples in current reporting: 800, step 58800
                            out of total 10000000


Iteration:  51%|█████     | 5460/10688 [1:17:18<1:12:19,  1.20it/s]

timestamp: 22/06/2020 05:14:29, average loss: 2.050666, time duration: 83.343014,
                            number of examples in current reporting: 800, step 58900
                            out of total 10000000


Iteration:  52%|█████▏    | 5559/10688 [1:18:42<1:12:44,  1.18it/s]

timestamp: 22/06/2020 05:15:54, average loss: 2.123273, time duration: 84.859774,
                            number of examples in current reporting: 800, step 59000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  53%|█████▎    | 5660/10688 [1:20:13<1:10:18,  1.19it/s]

timestamp: 22/06/2020 05:17:24, average loss: 2.108385, time duration: 90.059807,
                            number of examples in current reporting: 800, step 59100
                            out of total 10000000


Iteration:  54%|█████▍    | 5760/10688 [1:21:38<1:10:55,  1.16it/s]

timestamp: 22/06/2020 05:18:49, average loss: 2.081441, time duration: 84.976916,
                            number of examples in current reporting: 800, step 59200
                            out of total 10000000


Iteration:  55%|█████▍    | 5860/10688 [1:23:02<1:07:13,  1.20it/s]

timestamp: 22/06/2020 05:20:13, average loss: 2.141926, time duration: 84.001043,
                            number of examples in current reporting: 800, step 59300
                            out of total 10000000


Iteration:  56%|█████▌    | 5960/10688 [1:24:25<1:04:29,  1.22it/s]

timestamp: 22/06/2020 05:21:36, average loss: 2.100450, time duration: 82.860059,
                            number of examples in current reporting: 800, step 59400
                            out of total 10000000


Iteration:  57%|█████▋    | 6060/10688 [1:25:48<1:07:22,  1.14it/s]

timestamp: 22/06/2020 05:22:59, average loss: 2.111031, time duration: 83.366823,
                            number of examples in current reporting: 800, step 59500
                            out of total 10000000


Iteration:  58%|█████▊    | 6160/10688 [1:27:13<1:07:43,  1.11it/s]

timestamp: 22/06/2020 05:24:25, average loss: 2.132948, time duration: 85.552955,
                            number of examples in current reporting: 800, step 59600
                            out of total 10000000


Iteration:  59%|█████▊    | 6260/10688 [1:28:41<58:26,  1.26it/s]  

timestamp: 22/06/2020 05:25:52, average loss: 2.148064, time duration: 87.098697,
                            number of examples in current reporting: 800, step 59700
                            out of total 10000000


Iteration:  60%|█████▉    | 6360/10688 [1:30:06<1:02:37,  1.15it/s]

timestamp: 22/06/2020 05:27:17, average loss: 2.106563, time duration: 85.207819,
                            number of examples in current reporting: 800, step 59800
                            out of total 10000000


Iteration:  60%|██████    | 6460/10688 [1:31:31<1:02:38,  1.12it/s]

timestamp: 22/06/2020 05:28:43, average loss: 2.134684, time duration: 85.730824,
                            number of examples in current reporting: 800, step 59900
                            out of total 10000000


Iteration:  61%|██████▏   | 6559/10688 [1:32:55<59:30,  1.16it/s]  

timestamp: 22/06/2020 05:30:07, average loss: 2.095462, time duration: 84.570374,
                            number of examples in current reporting: 800, step 60000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  62%|██████▏   | 6660/10688 [1:34:24<55:04,  1.22it/s]  

timestamp: 22/06/2020 05:31:35, average loss: 2.158565, time duration: 87.746889,
                            number of examples in current reporting: 800, step 60100
                            out of total 10000000


Iteration:  63%|██████▎   | 6760/10688 [1:35:48<50:57,  1.28it/s]  

timestamp: 22/06/2020 05:32:59, average loss: 2.084840, time duration: 83.767584,
                            number of examples in current reporting: 800, step 60200
                            out of total 10000000


Iteration:  64%|██████▍   | 6860/10688 [1:37:14<56:05,  1.14it/s]  

timestamp: 22/06/2020 05:34:25, average loss: 2.192452, time duration: 86.132755,
                            number of examples in current reporting: 800, step 60300
                            out of total 10000000


Iteration:  65%|██████▌   | 6960/10688 [1:38:37<50:41,  1.23it/s]

timestamp: 22/06/2020 05:35:49, average loss: 2.062710, time duration: 83.767962,
                            number of examples in current reporting: 800, step 60400
                            out of total 10000000


Iteration:  66%|██████▌   | 7060/10688 [1:40:05<52:08,  1.16it/s]

timestamp: 22/06/2020 05:37:17, average loss: 2.171018, time duration: 88.021711,
                            number of examples in current reporting: 800, step 60500
                            out of total 10000000


Iteration:  67%|██████▋   | 7160/10688 [1:41:29<46:58,  1.25it/s]

timestamp: 22/06/2020 05:38:40, average loss: 2.090915, time duration: 83.466149,
                            number of examples in current reporting: 800, step 60600
                            out of total 10000000


Iteration:  68%|██████▊   | 7260/10688 [1:42:54<51:55,  1.10it/s]

timestamp: 22/06/2020 05:40:06, average loss: 2.109405, time duration: 85.266514,
                            number of examples in current reporting: 800, step 60700
                            out of total 10000000


Iteration:  69%|██████▉   | 7360/10688 [1:44:19<45:00,  1.23it/s]

timestamp: 22/06/2020 05:41:31, average loss: 2.131152, time duration: 85.191728,
                            number of examples in current reporting: 800, step 60800
                            out of total 10000000


Iteration:  70%|██████▉   | 7460/10688 [1:45:43<43:59,  1.22it/s]

timestamp: 22/06/2020 05:42:55, average loss: 2.063704, time duration: 83.784696,
                            number of examples in current reporting: 800, step 60900
                            out of total 10000000


Iteration:  71%|███████   | 7559/10688 [1:47:07<47:26,  1.10it/s]

timestamp: 22/06/2020 05:44:20, average loss: 2.129388, time duration: 85.035123,
                            number of examples in current reporting: 800, step 61000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  72%|███████▏  | 7660/10688 [1:48:38<41:56,  1.20it/s]  

timestamp: 22/06/2020 05:45:49, average loss: 2.055991, time duration: 89.486354,
                            number of examples in current reporting: 800, step 61100
                            out of total 10000000


Iteration:  73%|███████▎  | 7760/10688 [1:50:05<41:56,  1.16it/s]

timestamp: 22/06/2020 05:47:16, average loss: 2.165041, time duration: 87.033755,
                            number of examples in current reporting: 800, step 61200
                            out of total 10000000


Iteration:  74%|███████▎  | 7860/10688 [1:51:29<41:15,  1.14it/s]

timestamp: 22/06/2020 05:48:40, average loss: 2.090562, time duration: 83.912101,
                            number of examples in current reporting: 800, step 61300
                            out of total 10000000


Iteration:  74%|███████▍  | 7923/10688 [1:52:21<37:38,  1.22it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Iteration:  17%|█▋        | 1772/10688 [25:02<2:04:20,  1.20it/s]

timestamp: 22/06/2020 06:53:28, average loss: 2.065496, time duration: 86.291820,
                            number of examples in current reporting: 800, step 65900
                            out of total 10000000


Iteration:  18%|█▊        | 1871/10688 [26:26<2:03:37,  1.19it/s]

timestamp: 22/06/2020 06:54:53, average loss: 1.969477, time duration: 85.552192,
                            number of examples in current reporting: 800, step 66000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  18%|█▊        | 1972/10688 [27:55<1:58:06,  1.23it/s]

timestamp: 22/06/2020 06:56:22, average loss: 2.050412, time duration: 88.254238,
                            number of examples in current reporting: 800, step 66100
                            out of total 10000000


Iteration:  19%|█▉        | 2072/10688 [29:21<2:03:26,  1.16it/s]

timestamp: 22/06/2020 06:57:47, average loss: 2.057203, time duration: 85.099965,
                            number of examples in current reporting: 800, step 66200
                            out of total 10000000


Iteration:  20%|██        | 2172/10688 [30:45<1:55:45,  1.23it/s]

timestamp: 22/06/2020 06:59:11, average loss: 1.997836, time duration: 84.307386,
                            number of examples in current reporting: 800, step 66300
                            out of total 10000000


Iteration:  21%|██▏       | 2272/10688 [32:10<1:56:46,  1.20it/s]

timestamp: 22/06/2020 07:00:36, average loss: 2.031191, time duration: 84.872480,
                            number of examples in current reporting: 800, step 66400
                            out of total 10000000


Iteration:  22%|██▏       | 2372/10688 [33:34<1:55:16,  1.20it/s]

timestamp: 22/06/2020 07:02:01, average loss: 1.998543, time duration: 84.658266,
                            number of examples in current reporting: 800, step 66500
                            out of total 10000000


Iteration:  23%|██▎       | 2472/10688 [34:59<1:47:32,  1.27it/s]

timestamp: 22/06/2020 07:03:25, average loss: 2.027021, time duration: 84.489302,
                            number of examples in current reporting: 800, step 66600
                            out of total 10000000


Iteration:  24%|██▍       | 2572/10688 [36:24<1:55:50,  1.17it/s]

timestamp: 22/06/2020 07:04:50, average loss: 2.019121, time duration: 85.465532,
                            number of examples in current reporting: 800, step 66700
                            out of total 10000000


Iteration:  25%|██▌       | 2672/10688 [37:49<1:48:36,  1.23it/s]

timestamp: 22/06/2020 07:06:15, average loss: 1.951259, time duration: 84.628109,
                            number of examples in current reporting: 800, step 66800
                            out of total 10000000


Iteration:  26%|██▌       | 2772/10688 [39:14<1:54:49,  1.15it/s]

timestamp: 22/06/2020 07:07:41, average loss: 2.022930, time duration: 85.467268,
                            number of examples in current reporting: 800, step 66900
                            out of total 10000000


Iteration:  27%|██▋       | 2871/10688 [40:38<1:51:56,  1.16it/s]

timestamp: 22/06/2020 07:09:05, average loss: 1.976473, time duration: 84.069590,
                            number of examples in current reporting: 800, step 67000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  28%|██▊       | 2972/10688 [42:06<1:48:22,  1.19it/s]

timestamp: 22/06/2020 07:10:32, average loss: 2.051160, time duration: 87.606853,
                            number of examples in current reporting: 800, step 67100
                            out of total 10000000


Iteration:  29%|██▊       | 3072/10688 [43:30<1:47:18,  1.18it/s]

timestamp: 22/06/2020 07:11:56, average loss: 1.999814, time duration: 83.812565,
                            number of examples in current reporting: 800, step 67200
                            out of total 10000000


Iteration:  30%|██▉       | 3172/10688 [44:55<1:47:08,  1.17it/s]

timestamp: 22/06/2020 07:13:21, average loss: 2.058954, time duration: 85.324725,
                            number of examples in current reporting: 800, step 67300
                            out of total 10000000


Iteration:  31%|███       | 3272/10688 [46:20<1:38:45,  1.25it/s]

timestamp: 22/06/2020 07:14:47, average loss: 1.975290, time duration: 85.212434,
                            number of examples in current reporting: 800, step 67400
                            out of total 10000000


Iteration:  32%|███▏      | 3372/10688 [47:46<1:43:06,  1.18it/s]

timestamp: 22/06/2020 07:16:12, average loss: 2.035864, time duration: 85.601317,
                            number of examples in current reporting: 800, step 67500
                            out of total 10000000


Iteration:  32%|███▏      | 3472/10688 [49:11<1:51:03,  1.08it/s]

timestamp: 22/06/2020 07:17:37, average loss: 2.029684, time duration: 85.280726,
                            number of examples in current reporting: 800, step 67600
                            out of total 10000000


Iteration:  33%|███▎      | 3572/10688 [50:36<1:39:59,  1.19it/s]

timestamp: 22/06/2020 07:19:02, average loss: 2.013795, time duration: 84.805169,
                            number of examples in current reporting: 800, step 67700
                            out of total 10000000


Iteration:  34%|███▍      | 3672/10688 [52:01<1:47:56,  1.08it/s]

timestamp: 22/06/2020 07:20:27, average loss: 2.036698, time duration: 85.050977,
                            number of examples in current reporting: 800, step 67800
                            out of total 10000000


Iteration:  35%|███▌      | 3772/10688 [53:26<1:37:19,  1.18it/s]

timestamp: 22/06/2020 07:21:52, average loss: 2.073126, time duration: 84.789732,
                            number of examples in current reporting: 800, step 67900
                            out of total 10000000


Iteration:  36%|███▌      | 3871/10688 [54:50<1:36:45,  1.17it/s]

timestamp: 22/06/2020 07:23:17, average loss: 2.092818, time duration: 85.284966,
                            number of examples in current reporting: 800, step 68000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  37%|███▋      | 3972/10688 [56:20<1:37:29,  1.15it/s]

timestamp: 22/06/2020 07:24:46, average loss: 2.011018, time duration: 88.356493,
                            number of examples in current reporting: 800, step 68100
                            out of total 10000000


Iteration:  38%|███▊      | 4072/10688 [57:45<1:30:38,  1.22it/s]

timestamp: 22/06/2020 07:26:11, average loss: 2.078803, time duration: 85.105767,
                            number of examples in current reporting: 800, step 68200
                            out of total 10000000


Iteration:  39%|███▉      | 4172/10688 [59:11<1:32:03,  1.18it/s]

timestamp: 22/06/2020 07:27:37, average loss: 2.051455, time duration: 85.800754,
                            number of examples in current reporting: 800, step 68300
                            out of total 10000000


Iteration:  40%|███▉      | 4272/10688 [1:00:35<1:28:21,  1.21it/s]

timestamp: 22/06/2020 07:29:01, average loss: 2.019736, time duration: 84.814887,
                            number of examples in current reporting: 800, step 68400
                            out of total 10000000


Iteration:  41%|████      | 4372/10688 [1:02:00<1:30:48,  1.16it/s]

timestamp: 22/06/2020 07:30:26, average loss: 2.004415, time duration: 84.888393,
                            number of examples in current reporting: 800, step 68500
                            out of total 10000000


Iteration:  42%|████▏     | 4472/10688 [1:03:25<1:26:53,  1.19it/s]

timestamp: 22/06/2020 07:31:51, average loss: 2.018214, time duration: 84.400824,
                            number of examples in current reporting: 800, step 68600
                            out of total 10000000


Iteration:  43%|████▎     | 4572/10688 [1:04:50<1:26:07,  1.18it/s]

timestamp: 22/06/2020 07:33:16, average loss: 1.996752, time duration: 85.468137,
                            number of examples in current reporting: 800, step 68700
                            out of total 10000000


Iteration:  44%|████▎     | 4672/10688 [1:06:16<1:24:29,  1.19it/s]

timestamp: 22/06/2020 07:34:42, average loss: 2.033789, time duration: 86.182884,
                            number of examples in current reporting: 800, step 68800
                            out of total 10000000


Iteration:  45%|████▍     | 4772/10688 [1:07:43<1:30:29,  1.09it/s]

timestamp: 22/06/2020 07:36:09, average loss: 1.997132, time duration: 86.884317,
                            number of examples in current reporting: 800, step 68900
                            out of total 10000000


Iteration:  46%|████▌     | 4871/10688 [1:09:07<1:21:22,  1.19it/s]

timestamp: 22/06/2020 07:37:33, average loss: 1.975251, time duration: 84.159200,
                            number of examples in current reporting: 800, step 69000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  47%|████▋     | 4972/10688 [1:10:35<1:23:01,  1.15it/s]

timestamp: 22/06/2020 07:39:01, average loss: 1.982783, time duration: 87.236810,
                            number of examples in current reporting: 800, step 69100
                            out of total 10000000


Iteration:  47%|████▋     | 5072/10688 [1:12:01<1:19:06,  1.18it/s]

timestamp: 22/06/2020 07:40:27, average loss: 2.057548, time duration: 86.441558,
                            number of examples in current reporting: 800, step 69200
                            out of total 10000000


Iteration:  48%|████▊     | 5172/10688 [1:13:27<1:22:40,  1.11it/s]

timestamp: 22/06/2020 07:41:53, average loss: 1.986134, time duration: 85.702851,
                            number of examples in current reporting: 800, step 69300
                            out of total 10000000


Iteration:  49%|████▉     | 5272/10688 [1:14:53<1:15:34,  1.19it/s]

timestamp: 22/06/2020 07:43:19, average loss: 2.056910, time duration: 85.835881,
                            number of examples in current reporting: 800, step 69400
                            out of total 10000000


Iteration:  50%|█████     | 5372/10688 [1:16:16<1:08:56,  1.29it/s]

timestamp: 22/06/2020 07:44:42, average loss: 2.025858, time duration: 83.493248,
                            number of examples in current reporting: 800, step 69500
                            out of total 10000000


Iteration:  51%|█████     | 5472/10688 [1:17:39<1:13:10,  1.19it/s]

timestamp: 22/06/2020 07:46:05, average loss: 2.012453, time duration: 83.265934,
                            number of examples in current reporting: 800, step 69600
                            out of total 10000000


Iteration:  52%|█████▏    | 5572/10688 [1:19:03<1:09:39,  1.22it/s]

timestamp: 22/06/2020 07:47:29, average loss: 1.999845, time duration: 83.438328,
                            number of examples in current reporting: 800, step 69700
                            out of total 10000000


Iteration:  53%|█████▎    | 5672/10688 [1:20:26<1:05:15,  1.28it/s]

timestamp: 22/06/2020 07:48:52, average loss: 1.984836, time duration: 83.181142,
                            number of examples in current reporting: 800, step 69800
                            out of total 10000000


Iteration:  54%|█████▍    | 5772/10688 [1:21:51<1:07:59,  1.21it/s]

timestamp: 22/06/2020 07:50:17, average loss: 2.016744, time duration: 84.956725,
                            number of examples in current reporting: 800, step 69900
                            out of total 10000000


Iteration:  55%|█████▍    | 5871/10688 [1:23:15<1:09:07,  1.16it/s]

timestamp: 22/06/2020 07:51:42, average loss: 2.045907, time duration: 84.910436,
                            number of examples in current reporting: 800, step 70000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  56%|█████▌    | 5972/10688 [1:24:44<1:03:32,  1.24it/s]

timestamp: 22/06/2020 07:53:10, average loss: 2.024487, time duration: 87.759041,
                            number of examples in current reporting: 800, step 70100
                            out of total 10000000


Iteration:  57%|█████▋    | 6072/10688 [1:26:08<1:03:38,  1.21it/s]

timestamp: 22/06/2020 07:54:34, average loss: 2.031423, time duration: 84.181070,
                            number of examples in current reporting: 800, step 70200
                            out of total 10000000


Iteration:  58%|█████▊    | 6172/10688 [1:27:32<1:01:41,  1.22it/s]

timestamp: 22/06/2020 07:55:59, average loss: 2.038364, time duration: 84.645264,
                            number of examples in current reporting: 800, step 70300
                            out of total 10000000


Iteration:  59%|█████▊    | 6272/10688 [1:28:55<1:04:01,  1.15it/s]

timestamp: 22/06/2020 07:57:22, average loss: 1.967033, time duration: 83.052113,
                            number of examples in current reporting: 800, step 70400
                            out of total 10000000


Iteration:  60%|█████▉    | 6372/10688 [1:30:18<57:54,  1.24it/s]  

timestamp: 22/06/2020 07:58:44, average loss: 2.053621, time duration: 82.864794,
                            number of examples in current reporting: 800, step 70500
                            out of total 10000000


Iteration:  61%|██████    | 6472/10688 [1:31:41<56:31,  1.24it/s]  

timestamp: 22/06/2020 08:00:07, average loss: 2.061050, time duration: 82.865402,
                            number of examples in current reporting: 800, step 70600
                            out of total 10000000


Iteration:  61%|██████▏   | 6572/10688 [1:33:05<55:58,  1.23it/s]  

timestamp: 22/06/2020 08:01:31, average loss: 2.056743, time duration: 83.714007,
                            number of examples in current reporting: 800, step 70700
                            out of total 10000000


Iteration:  62%|██████▏   | 6672/10688 [1:34:28<53:56,  1.24it/s]  

timestamp: 22/06/2020 08:02:55, average loss: 2.013001, time duration: 83.548557,
                            number of examples in current reporting: 800, step 70800
                            out of total 10000000


Iteration:  63%|██████▎   | 6772/10688 [1:35:52<57:45,  1.13it/s]

timestamp: 22/06/2020 08:04:18, average loss: 2.013302, time duration: 83.433421,
                            number of examples in current reporting: 800, step 70900
                            out of total 10000000


Iteration:  64%|██████▍   | 6871/10688 [1:37:16<57:01,  1.12it/s]  

timestamp: 22/06/2020 08:05:43, average loss: 2.033236, time duration: 84.684171,
                            number of examples in current reporting: 800, step 71000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  65%|██████▌   | 6972/10688 [1:38:44<50:05,  1.24it/s]  

timestamp: 22/06/2020 08:07:10, average loss: 1.991394, time duration: 87.120152,
                            number of examples in current reporting: 800, step 71100
                            out of total 10000000


Iteration:  66%|██████▌   | 7072/10688 [1:40:13<56:56,  1.06it/s]  

timestamp: 22/06/2020 08:08:39, average loss: 2.039314, time duration: 89.058316,
                            number of examples in current reporting: 800, step 71200
                            out of total 10000000


Iteration:  67%|██████▋   | 7172/10688 [1:41:41<52:53,  1.11it/s]

timestamp: 22/06/2020 08:10:07, average loss: 1.975965, time duration: 88.057176,
                            number of examples in current reporting: 800, step 71300
                            out of total 10000000


Iteration:  68%|██████▊   | 7272/10688 [1:43:10<50:48,  1.12it/s]

timestamp: 22/06/2020 08:11:36, average loss: 1.996754, time duration: 89.057824,
                            number of examples in current reporting: 800, step 71400
                            out of total 10000000


Iteration:  69%|██████▉   | 7372/10688 [1:44:39<50:58,  1.08it/s]

timestamp: 22/06/2020 08:13:06, average loss: 2.005171, time duration: 89.584631,
                            number of examples in current reporting: 800, step 71500
                            out of total 10000000


Iteration:  70%|██████▉   | 7472/10688 [1:46:05<43:49,  1.22it/s]

timestamp: 22/06/2020 08:14:31, average loss: 2.025710, time duration: 85.295434,
                            number of examples in current reporting: 800, step 71600
                            out of total 10000000


Iteration:  71%|███████   | 7572/10688 [1:47:31<46:42,  1.11it/s]

timestamp: 22/06/2020 08:15:57, average loss: 2.017118, time duration: 86.563640,
                            number of examples in current reporting: 800, step 71700
                            out of total 10000000


Iteration:  72%|███████▏  | 7672/10688 [1:48:57<44:26,  1.13it/s]

timestamp: 22/06/2020 08:17:24, average loss: 2.032768, time duration: 86.101878,
                            number of examples in current reporting: 800, step 71800
                            out of total 10000000


Iteration:  73%|███████▎  | 7772/10688 [1:50:24<39:40,  1.23it/s]

timestamp: 22/06/2020 08:18:50, average loss: 2.014520, time duration: 86.502060,
                            number of examples in current reporting: 800, step 71900
                            out of total 10000000


Iteration:  74%|███████▎  | 7871/10688 [1:51:49<40:16,  1.17it/s]

timestamp: 22/06/2020 08:20:16, average loss: 2.055099, time duration: 86.112032,
                            number of examples in current reporting: 800, step 72000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  75%|███████▍  | 7972/10688 [1:53:18<38:49,  1.17it/s]  

timestamp: 22/06/2020 08:21:44, average loss: 1.978734, time duration: 87.866658,
                            number of examples in current reporting: 800, step 72100
                            out of total 10000000


Iteration:  76%|███████▌  | 8072/10688 [1:54:43<35:06,  1.24it/s]

timestamp: 22/06/2020 08:23:09, average loss: 2.053718, time duration: 85.058517,
                            number of examples in current reporting: 800, step 72200
                            out of total 10000000


Iteration:  76%|███████▋  | 8172/10688 [1:56:09<34:56,  1.20it/s]

timestamp: 22/06/2020 08:24:35, average loss: 2.067921, time duration: 86.049490,
                            number of examples in current reporting: 800, step 72300
                            out of total 10000000


Iteration:  77%|███████▋  | 8272/10688 [1:57:35<35:30,  1.13it/s]

timestamp: 22/06/2020 08:26:01, average loss: 2.025712, time duration: 85.718710,
                            number of examples in current reporting: 800, step 72400
                            out of total 10000000


Iteration:  78%|███████▊  | 8372/10688 [1:59:00<33:32,  1.15it/s]

timestamp: 22/06/2020 08:27:26, average loss: 2.029971, time duration: 85.567611,
                            number of examples in current reporting: 800, step 72500
                            out of total 10000000


Iteration:  79%|███████▉  | 8472/10688 [2:00:25<32:05,  1.15it/s]

timestamp: 22/06/2020 08:28:51, average loss: 1.980051, time duration: 84.353878,
                            number of examples in current reporting: 800, step 72600
                            out of total 10000000


Iteration:  80%|████████  | 8572/10688 [2:01:49<31:29,  1.12it/s]

timestamp: 22/06/2020 08:30:15, average loss: 2.066655, time duration: 84.448952,
                            number of examples in current reporting: 800, step 72700
                            out of total 10000000


Iteration:  81%|████████  | 8672/10688 [2:03:13<28:52,  1.16it/s]

timestamp: 22/06/2020 08:31:39, average loss: 2.082770, time duration: 83.740566,
                            number of examples in current reporting: 800, step 72800
                            out of total 10000000


Iteration:  82%|████████▏ | 8772/10688 [2:04:37<28:38,  1.11it/s]

timestamp: 22/06/2020 08:33:03, average loss: 2.017011, time duration: 84.133814,
                            number of examples in current reporting: 800, step 72900
                            out of total 10000000


Iteration:  83%|████████▎ | 8871/10688 [2:06:01<24:28,  1.24it/s]

timestamp: 22/06/2020 08:34:27, average loss: 2.023815, time duration: 84.360893,
                            number of examples in current reporting: 800, step 73000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  84%|████████▍ | 8972/10688 [2:07:29<25:04,  1.14it/s]

timestamp: 22/06/2020 08:35:55, average loss: 2.035543, time duration: 87.464274,
                            number of examples in current reporting: 800, step 73100
                            out of total 10000000


Iteration:  85%|████████▍ | 9072/10688 [2:08:52<23:00,  1.17it/s]

timestamp: 22/06/2020 08:37:19, average loss: 2.033400, time duration: 83.653217,
                            number of examples in current reporting: 800, step 73200
                            out of total 10000000


Iteration:  86%|████████▌ | 9172/10688 [2:10:15<21:02,  1.20it/s]

timestamp: 22/06/2020 08:38:41, average loss: 1.991978, time duration: 82.895047,
                            number of examples in current reporting: 800, step 73300
                            out of total 10000000


Iteration:  87%|████████▋ | 9272/10688 [2:11:39<18:59,  1.24it/s]

timestamp: 22/06/2020 08:40:06, average loss: 2.022182, time duration: 84.170834,
                            number of examples in current reporting: 800, step 73400
                            out of total 10000000


Iteration:  88%|████████▊ | 9372/10688 [2:13:04<19:31,  1.12it/s]

timestamp: 22/06/2020 08:41:31, average loss: 2.040225, time duration: 84.917528,
                            number of examples in current reporting: 800, step 73500
                            out of total 10000000


Iteration:  89%|████████▊ | 9472/10688 [2:14:28<16:16,  1.25it/s]

timestamp: 22/06/2020 08:42:54, average loss: 2.039030, time duration: 83.959645,
                            number of examples in current reporting: 800, step 73600
                            out of total 10000000


Iteration:  90%|████████▉ | 9572/10688 [2:15:53<15:44,  1.18it/s]

timestamp: 22/06/2020 08:44:19, average loss: 2.028483, time duration: 84.584073,
                            number of examples in current reporting: 800, step 73700
                            out of total 10000000


Iteration:  90%|█████████ | 9672/10688 [2:17:17<14:19,  1.18it/s]

timestamp: 22/06/2020 08:45:43, average loss: 1.973442, time duration: 83.753620,
                            number of examples in current reporting: 800, step 73800
                            out of total 10000000


Iteration:  91%|█████████▏| 9772/10688 [2:18:40<12:04,  1.26it/s]

timestamp: 22/06/2020 08:47:06, average loss: 1.973640, time duration: 83.076477,
                            number of examples in current reporting: 800, step 73900
                            out of total 10000000


Iteration:  92%|█████████▏| 9871/10688 [2:20:06<12:42,  1.07it/s]

timestamp: 22/06/2020 08:48:33, average loss: 1.959612, time duration: 87.537156,
                            number of examples in current reporting: 800, step 74000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  93%|█████████▎| 9972/10688 [2:21:35<09:50,  1.21it/s]

timestamp: 22/06/2020 08:50:01, average loss: 2.002845, time duration: 87.730364,
                            number of examples in current reporting: 800, step 74100
                            out of total 10000000


Iteration:  94%|█████████▍| 10072/10688 [2:23:00<08:36,  1.19it/s]

timestamp: 22/06/2020 08:51:26, average loss: 2.014313, time duration: 84.738279,
                            number of examples in current reporting: 800, step 74200
                            out of total 10000000


Iteration:  95%|█████████▌| 10172/10688 [2:24:23<06:56,  1.24it/s]

timestamp: 22/06/2020 08:52:49, average loss: 2.021579, time duration: 83.556749,
                            number of examples in current reporting: 800, step 74300
                            out of total 10000000


Iteration:  96%|█████████▌| 10272/10688 [2:25:47<05:41,  1.22it/s]

timestamp: 22/06/2020 08:54:13, average loss: 1.975143, time duration: 83.482010,
                            number of examples in current reporting: 800, step 74400
                            out of total 10000000


Iteration:  97%|█████████▋| 10372/10688 [2:27:10<04:19,  1.22it/s]

timestamp: 22/06/2020 08:55:36, average loss: 2.023561, time duration: 83.397763,
                            number of examples in current reporting: 800, step 74500
                            out of total 10000000


Iteration:  98%|█████████▊| 10472/10688 [2:28:34<03:00,  1.20it/s]

timestamp: 22/06/2020 08:57:00, average loss: 2.020532, time duration: 84.015127,
                            number of examples in current reporting: 800, step 74600
                            out of total 10000000


Iteration:  99%|█████████▉| 10572/10688 [2:29:58<01:32,  1.25it/s]

timestamp: 22/06/2020 08:58:25, average loss: 2.027331, time duration: 84.247357,
                            number of examples in current reporting: 800, step 74700
                            out of total 10000000


Iteration: 100%|█████████▉| 10672/10688 [2:31:23<00:14,  1.10it/s]

timestamp: 22/06/2020 08:59:49, average loss: 1.938358, time duration: 84.265539,
                            number of examples in current reporting: 800, step 74800
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:36<00:00,  1.17it/s]
Iteration:   1%|          | 84/10688 [01:13<2:39:34,  1.11it/s]

timestamp: 22/06/2020 09:01:16, average loss: 1.944026, time duration: 87.318267,
                            number of examples in current reporting: 796, step 74900
                            out of total 10000000


Iteration:   2%|▏         | 183/10688 [02:37<2:27:52,  1.18it/s]

timestamp: 22/06/2020 09:02:41, average loss: 1.921366, time duration: 84.782878,
                            number of examples in current reporting: 800, step 75000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:   3%|▎         | 284/10688 [04:06<2:19:36,  1.24it/s]

timestamp: 22/06/2020 09:04:09, average loss: 1.870255, time duration: 88.131534,
                            number of examples in current reporting: 800, step 75100
                            out of total 10000000


Iteration:   4%|▎         | 384/10688 [05:32<2:28:01,  1.16it/s]

timestamp: 22/06/2020 09:05:35, average loss: 1.950613, time duration: 85.633452,
                            number of examples in current reporting: 800, step 75200
                            out of total 10000000


Iteration:   5%|▍         | 484/10688 [06:58<2:24:23,  1.18it/s]

timestamp: 22/06/2020 09:07:00, average loss: 1.945421, time duration: 85.645484,
                            number of examples in current reporting: 800, step 75300
                            out of total 10000000


Iteration:   5%|▌         | 584/10688 [08:23<2:12:30,  1.27it/s]

timestamp: 22/06/2020 09:08:25, average loss: 1.955062, time duration: 84.919071,
                            number of examples in current reporting: 800, step 75400
                            out of total 10000000


Iteration:   6%|▋         | 684/10688 [09:46<2:23:35,  1.16it/s]

timestamp: 22/06/2020 09:09:49, average loss: 1.927381, time duration: 83.640620,
                            number of examples in current reporting: 800, step 75500
                            out of total 10000000


Iteration:   7%|▋         | 784/10688 [11:10<2:24:01,  1.15it/s]

timestamp: 22/06/2020 09:11:13, average loss: 1.884696, time duration: 84.026867,
                            number of examples in current reporting: 800, step 75600
                            out of total 10000000


Iteration:   8%|▊         | 884/10688 [12:36<2:18:04,  1.18it/s]

timestamp: 22/06/2020 09:12:39, average loss: 1.935817, time duration: 85.770984,
                            number of examples in current reporting: 800, step 75700
                            out of total 10000000


Iteration:   9%|▉         | 984/10688 [14:00<2:11:06,  1.23it/s]

timestamp: 22/06/2020 09:14:03, average loss: 1.863481, time duration: 84.122425,
                            number of examples in current reporting: 800, step 75800
                            out of total 10000000


Iteration:  10%|█         | 1084/10688 [15:24<2:16:35,  1.17it/s]

timestamp: 22/06/2020 09:15:27, average loss: 1.923185, time duration: 84.067211,
                            number of examples in current reporting: 800, step 75900
                            out of total 10000000


Iteration:  11%|█         | 1183/10688 [16:48<2:10:58,  1.21it/s]

timestamp: 22/06/2020 09:16:51, average loss: 1.864455, time duration: 84.528862,
                            number of examples in current reporting: 800, step 76000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  12%|█▏        | 1284/10688 [18:17<2:05:24,  1.25it/s]

timestamp: 22/06/2020 09:18:20, average loss: 1.942336, time duration: 88.207323,
                            number of examples in current reporting: 800, step 76100
                            out of total 10000000


Iteration:  13%|█▎        | 1384/10688 [19:43<2:14:56,  1.15it/s]

timestamp: 22/06/2020 09:19:46, average loss: 1.906669, time duration: 86.227342,
                            number of examples in current reporting: 800, step 76200
                            out of total 10000000


Iteration:  14%|█▍        | 1484/10688 [21:08<2:07:41,  1.20it/s]

timestamp: 22/06/2020 09:21:10, average loss: 1.987689, time duration: 84.403107,
                            number of examples in current reporting: 800, step 76300
                            out of total 10000000


Iteration:  15%|█▍        | 1584/10688 [22:33<2:03:52,  1.22it/s]

timestamp: 22/06/2020 09:22:36, average loss: 1.963295, time duration: 85.670346,
                            number of examples in current reporting: 800, step 76400
                            out of total 10000000


Iteration:  16%|█▌        | 1684/10688 [23:59<2:19:03,  1.08it/s]

timestamp: 22/06/2020 09:24:02, average loss: 1.932638, time duration: 85.575994,
                            number of examples in current reporting: 800, step 76500
                            out of total 10000000


Iteration:  17%|█▋        | 1784/10688 [25:24<2:03:43,  1.20it/s]

timestamp: 22/06/2020 09:25:27, average loss: 1.948002, time duration: 85.263189,
                            number of examples in current reporting: 800, step 76600
                            out of total 10000000


Iteration:  18%|█▊        | 1884/10688 [26:49<2:01:21,  1.21it/s]

timestamp: 22/06/2020 09:26:52, average loss: 1.892941, time duration: 85.384413,
                            number of examples in current reporting: 800, step 76700
                            out of total 10000000


Iteration:  19%|█▊        | 1984/10688 [28:14<2:04:12,  1.17it/s]

timestamp: 22/06/2020 09:28:16, average loss: 1.917510, time duration: 84.160105,
                            number of examples in current reporting: 800, step 76800
                            out of total 10000000


Iteration:  19%|█▉        | 2084/10688 [29:37<2:06:44,  1.13it/s]

timestamp: 22/06/2020 09:29:40, average loss: 1.902506, time duration: 83.565979,
                            number of examples in current reporting: 800, step 76900
                            out of total 10000000


Iteration:  20%|██        | 2183/10688 [31:01<2:03:28,  1.15it/s]

timestamp: 22/06/2020 09:31:05, average loss: 1.932377, time duration: 84.814640,
                            number of examples in current reporting: 800, step 77000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  21%|██▏       | 2284/10688 [32:29<1:58:03,  1.19it/s]

timestamp: 22/06/2020 09:32:32, average loss: 1.898877, time duration: 87.306648,
                            number of examples in current reporting: 800, step 77100
                            out of total 10000000


Iteration:  22%|██▏       | 2384/10688 [33:54<2:01:23,  1.14it/s]

timestamp: 22/06/2020 09:33:57, average loss: 1.999313, time duration: 85.159041,
                            number of examples in current reporting: 800, step 77200
                            out of total 10000000


Iteration:  23%|██▎       | 2484/10688 [35:20<1:52:59,  1.21it/s]

timestamp: 22/06/2020 09:35:23, average loss: 1.934344, time duration: 85.638126,
                            number of examples in current reporting: 800, step 77300
                            out of total 10000000


Iteration:  24%|██▍       | 2584/10688 [36:45<1:50:16,  1.22it/s]

timestamp: 22/06/2020 09:36:48, average loss: 1.948947, time duration: 85.355067,
                            number of examples in current reporting: 800, step 77400
                            out of total 10000000


Iteration:  25%|██▌       | 2684/10688 [38:11<1:58:06,  1.13it/s]

timestamp: 22/06/2020 09:38:14, average loss: 1.904084, time duration: 85.902952,
                            number of examples in current reporting: 800, step 77500
                            out of total 10000000


Iteration:  26%|██▌       | 2784/10688 [39:36<1:52:04,  1.18it/s]

timestamp: 22/06/2020 09:39:39, average loss: 1.901223, time duration: 84.606144,
                            number of examples in current reporting: 800, step 77600
                            out of total 10000000


Iteration:  27%|██▋       | 2884/10688 [41:00<1:46:59,  1.22it/s]

timestamp: 22/06/2020 09:41:03, average loss: 1.996751, time duration: 84.174119,
                            number of examples in current reporting: 800, step 77700
                            out of total 10000000


Iteration:  28%|██▊       | 2984/10688 [42:23<1:51:12,  1.15it/s]

timestamp: 22/06/2020 09:42:26, average loss: 1.960098, time duration: 83.312644,
                            number of examples in current reporting: 800, step 77800
                            out of total 10000000


Iteration:  29%|██▉       | 3084/10688 [43:47<1:53:31,  1.12it/s]

timestamp: 22/06/2020 09:43:50, average loss: 1.880870, time duration: 83.615103,
                            number of examples in current reporting: 800, step 77900
                            out of total 10000000


Iteration:  30%|██▉       | 3183/10688 [45:11<1:50:51,  1.13it/s]

timestamp: 22/06/2020 09:45:14, average loss: 1.955082, time duration: 84.441406,
                            number of examples in current reporting: 800, step 78000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  31%|███       | 3284/10688 [46:38<1:46:59,  1.15it/s]

timestamp: 22/06/2020 09:46:41, average loss: 1.938125, time duration: 86.967481,
                            number of examples in current reporting: 800, step 78100
                            out of total 10000000


Iteration:  32%|███▏      | 3384/10688 [48:07<1:50:14,  1.10it/s]

timestamp: 22/06/2020 09:48:10, average loss: 2.003133, time duration: 88.614043,
                            number of examples in current reporting: 800, step 78200
                            out of total 10000000


Iteration:  33%|███▎      | 3484/10688 [49:35<1:42:32,  1.17it/s]

timestamp: 22/06/2020 09:49:38, average loss: 1.936916, time duration: 88.214695,
                            number of examples in current reporting: 800, step 78300
                            out of total 10000000


Iteration:  34%|███▎      | 3584/10688 [51:04<1:35:38,  1.24it/s]

timestamp: 22/06/2020 09:51:06, average loss: 1.979464, time duration: 88.332623,
                            number of examples in current reporting: 800, step 78400
                            out of total 10000000


Iteration:  34%|███▍      | 3684/10688 [52:29<1:42:47,  1.14it/s]

timestamp: 22/06/2020 09:52:32, average loss: 1.955082, time duration: 85.720684,
                            number of examples in current reporting: 800, step 78500
                            out of total 10000000


Iteration:  35%|███▌      | 3784/10688 [53:53<1:31:19,  1.26it/s]

timestamp: 22/06/2020 09:53:55, average loss: 1.946172, time duration: 83.221915,
                            number of examples in current reporting: 800, step 78600
                            out of total 10000000


Iteration:  36%|███▋      | 3884/10688 [55:16<1:38:10,  1.16it/s]

timestamp: 22/06/2020 09:55:19, average loss: 1.962545, time duration: 83.787340,
                            number of examples in current reporting: 800, step 78700
                            out of total 10000000


Iteration:  37%|███▋      | 3984/10688 [56:40<1:42:26,  1.09it/s]

timestamp: 22/06/2020 09:56:43, average loss: 1.915595, time duration: 83.983619,
                            number of examples in current reporting: 800, step 78800
                            out of total 10000000


Iteration:  38%|███▊      | 4084/10688 [58:05<1:32:03,  1.20it/s]

timestamp: 22/06/2020 09:58:08, average loss: 1.914623, time duration: 84.779138,
                            number of examples in current reporting: 800, step 78900
                            out of total 10000000


Iteration:  39%|███▉      | 4183/10688 [59:28<1:25:41,  1.27it/s]

timestamp: 22/06/2020 09:59:32, average loss: 1.937753, time duration: 84.446733,
                            number of examples in current reporting: 800, step 79000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  40%|████      | 4284/10688 [1:00:57<1:27:30,  1.22it/s]

timestamp: 22/06/2020 10:01:00, average loss: 1.947740, time duration: 87.570079,
                            number of examples in current reporting: 800, step 79100
                            out of total 10000000


Iteration:  41%|████      | 4384/10688 [1:02:21<1:26:56,  1.21it/s]

timestamp: 22/06/2020 10:02:24, average loss: 1.895377, time duration: 84.288018,
                            number of examples in current reporting: 800, step 79200
                            out of total 10000000


Iteration:  42%|████▏     | 4484/10688 [1:03:46<1:28:28,  1.17it/s]

timestamp: 22/06/2020 10:03:49, average loss: 1.959609, time duration: 84.789676,
                            number of examples in current reporting: 800, step 79300
                            out of total 10000000


Iteration:  43%|████▎     | 4584/10688 [1:05:11<1:24:50,  1.20it/s]

timestamp: 22/06/2020 10:05:14, average loss: 1.912135, time duration: 84.813603,
                            number of examples in current reporting: 800, step 79400
                            out of total 10000000


Iteration:  44%|████▍     | 4684/10688 [1:06:37<1:28:31,  1.13it/s]

timestamp: 22/06/2020 10:06:40, average loss: 1.949141, time duration: 85.986781,
                            number of examples in current reporting: 800, step 79500
                            out of total 10000000


Iteration:  45%|████▍     | 4784/10688 [1:08:03<1:22:16,  1.20it/s]

timestamp: 22/06/2020 10:08:06, average loss: 1.938557, time duration: 85.847759,
                            number of examples in current reporting: 800, step 79600
                            out of total 10000000


Iteration:  46%|████▌     | 4884/10688 [1:09:28<1:21:55,  1.18it/s]

timestamp: 22/06/2020 10:09:30, average loss: 1.992903, time duration: 84.832307,
                            number of examples in current reporting: 800, step 79700
                            out of total 10000000


Iteration:  47%|████▋     | 4984/10688 [1:10:52<1:19:12,  1.20it/s]

timestamp: 22/06/2020 10:10:55, average loss: 1.955927, time duration: 84.598564,
                            number of examples in current reporting: 800, step 79800
                            out of total 10000000


Iteration:  48%|████▊     | 5084/10688 [1:12:17<1:14:32,  1.25it/s]

timestamp: 22/06/2020 10:12:20, average loss: 1.908271, time duration: 84.533784,
                            number of examples in current reporting: 800, step 79900
                            out of total 10000000


Iteration:  48%|████▊     | 5183/10688 [1:13:40<1:16:49,  1.19it/s]

timestamp: 22/06/2020 10:13:44, average loss: 1.982348, time duration: 84.328021,
                            number of examples in current reporting: 800, step 80000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  49%|████▉     | 5284/10688 [1:15:08<1:22:08,  1.10it/s]

timestamp: 22/06/2020 10:15:10, average loss: 1.935653, time duration: 86.457260,
                            number of examples in current reporting: 800, step 80100
                            out of total 10000000


Iteration:  50%|█████     | 5384/10688 [1:16:32<1:10:54,  1.25it/s]

timestamp: 22/06/2020 10:16:35, average loss: 1.938380, time duration: 84.496841,
                            number of examples in current reporting: 800, step 80200
                            out of total 10000000


Iteration:  51%|█████▏    | 5484/10688 [1:17:57<1:16:54,  1.13it/s]

timestamp: 22/06/2020 10:18:00, average loss: 1.949329, time duration: 85.112242,
                            number of examples in current reporting: 800, step 80300
                            out of total 10000000


Iteration:  52%|█████▏    | 5584/10688 [1:19:22<1:09:35,  1.22it/s]

timestamp: 22/06/2020 10:19:25, average loss: 1.925019, time duration: 84.910347,
                            number of examples in current reporting: 800, step 80400
                            out of total 10000000


Iteration:  53%|█████▎    | 5684/10688 [1:20:45<1:10:48,  1.18it/s]

timestamp: 22/06/2020 10:20:48, average loss: 1.944535, time duration: 83.161214,
                            number of examples in current reporting: 800, step 80500
                            out of total 10000000


Iteration:  54%|█████▍    | 5784/10688 [1:22:10<1:12:39,  1.12it/s]

timestamp: 22/06/2020 10:22:13, average loss: 1.968710, time duration: 84.542291,
                            number of examples in current reporting: 800, step 80600
                            out of total 10000000


Iteration:  55%|█████▌    | 5884/10688 [1:23:35<1:06:11,  1.21it/s]

timestamp: 22/06/2020 10:23:38, average loss: 1.970866, time duration: 85.266966,
                            number of examples in current reporting: 800, step 80700
                            out of total 10000000


Iteration:  56%|█████▌    | 5952/10688 [1:24:33<1:07:45,  1.16it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Iteration:  87%|████████▋ | 9284/10688 [2:11:40<19:48,  1.18it/s]

timestamp: 22/06/2020 11:11:43, average loss: 1.898869, time duration: 88.672905,
                            number of examples in current reporting: 800, step 84100
                            out of total 10000000


Iteration:  88%|████████▊ | 9384/10688 [2:13:05<17:27,  1.24it/s]

timestamp: 22/06/2020 11:13:08, average loss: 1.900477, time duration: 84.671554,
                            number of examples in current reporting: 800, step 84200
                            out of total 10000000


Iteration:  89%|████████▊ | 9484/10688 [2:14:30<18:12,  1.10it/s]

timestamp: 22/06/2020 11:14:33, average loss: 1.947918, time duration: 85.695904,
                            number of examples in current reporting: 800, step 84300
                            out of total 10000000


Iteration:  90%|████████▉ | 9584/10688 [2:15:55<14:21,  1.28it/s]

timestamp: 22/06/2020 11:15:58, average loss: 1.888434, time duration: 84.283991,
                            number of examples in current reporting: 800, step 84400
                            out of total 10000000


Iteration:  91%|█████████ | 9684/10688 [2:17:20<14:36,  1.14it/s]

timestamp: 22/06/2020 11:17:23, average loss: 1.936478, time duration: 85.331075,
                            number of examples in current reporting: 800, step 84500
                            out of total 10000000


Iteration:  92%|█████████▏| 9784/10688 [2:18:46<12:12,  1.23it/s]

timestamp: 22/06/2020 11:18:49, average loss: 1.950601, time duration: 85.793209,
                            number of examples in current reporting: 800, step 84600
                            out of total 10000000


Iteration:  92%|█████████▏| 9884/10688 [2:20:12<10:54,  1.23it/s]

timestamp: 22/06/2020 11:20:14, average loss: 1.894267, time duration: 85.618816,
                            number of examples in current reporting: 800, step 84700
                            out of total 10000000


Iteration:  93%|█████████▎| 9984/10688 [2:21:38<10:50,  1.08it/s]

timestamp: 22/06/2020 11:21:41, average loss: 1.957788, time duration: 86.338947,
                            number of examples in current reporting: 800, step 84800
                            out of total 10000000


Iteration:  94%|█████████▍| 10084/10688 [2:23:03<08:27,  1.19it/s]

timestamp: 22/06/2020 11:23:06, average loss: 1.960712, time duration: 84.968654,
                            number of examples in current reporting: 800, step 84900
                            out of total 10000000


Iteration:  95%|█████████▌| 10183/10688 [2:24:26<07:19,  1.15it/s]

timestamp: 22/06/2020 11:24:30, average loss: 1.959888, time duration: 84.729920,
                            number of examples in current reporting: 800, step 85000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  96%|█████████▌| 10284/10688 [2:25:55<05:32,  1.22it/s]

timestamp: 22/06/2020 11:25:58, average loss: 1.906541, time duration: 87.695254,
                            number of examples in current reporting: 800, step 85100
                            out of total 10000000


Iteration:  97%|█████████▋| 10384/10688 [2:27:21<04:30,  1.12it/s]

timestamp: 22/06/2020 11:27:23, average loss: 1.928507, time duration: 85.268486,
                            number of examples in current reporting: 800, step 85200
                            out of total 10000000


Iteration:  98%|█████████▊| 10484/10688 [2:28:45<02:53,  1.18it/s]

timestamp: 22/06/2020 11:28:48, average loss: 1.939547, time duration: 84.267008,
                            number of examples in current reporting: 800, step 85300
                            out of total 10000000


Iteration:  99%|█████████▉| 10584/10688 [2:30:11<01:27,  1.19it/s]

timestamp: 22/06/2020 11:30:13, average loss: 1.967607, time duration: 85.718747,
                            number of examples in current reporting: 800, step 85400
                            out of total 10000000


Iteration: 100%|█████████▉| 10684/10688 [2:31:37<00:03,  1.19it/s]

timestamp: 22/06/2020 11:31:39, average loss: 1.937145, time duration: 86.189459,
                            number of examples in current reporting: 800, step 85500
                            out of total 10000000


Iteration: 100%|██████████| 10688/10688 [2:31:40<00:00,  1.17it/s]
Iteration:   1%|          | 96/10688 [01:20<2:29:47,  1.18it/s]

timestamp: 22/06/2020 11:33:03, average loss: 1.817609, time duration: 83.459818,
                            number of examples in current reporting: 796, step 85600
                            out of total 10000000


Iteration:   2%|▏         | 196/10688 [02:45<2:23:32,  1.22it/s]

timestamp: 22/06/2020 11:34:28, average loss: 1.865697, time duration: 84.813893,
                            number of examples in current reporting: 800, step 85700
                            out of total 10000000


Iteration:   3%|▎         | 296/10688 [04:09<2:31:16,  1.14it/s]

timestamp: 22/06/2020 11:35:52, average loss: 1.826768, time duration: 84.399637,
                            number of examples in current reporting: 800, step 85800
                            out of total 10000000


Iteration:   4%|▎         | 396/10688 [05:34<2:17:58,  1.24it/s]

timestamp: 22/06/2020 11:37:17, average loss: 1.846638, time duration: 85.320663,
                            number of examples in current reporting: 800, step 85900
                            out of total 10000000


Iteration:   5%|▍         | 495/10688 [07:01<2:34:53,  1.10it/s]

timestamp: 22/06/2020 11:38:45, average loss: 1.837036, time duration: 87.161440,
                            number of examples in current reporting: 800, step 86000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:   6%|▌         | 596/10688 [08:34<2:29:09,  1.13it/s]

timestamp: 22/06/2020 11:40:17, average loss: 1.839394, time duration: 92.146898,
                            number of examples in current reporting: 800, step 86100
                            out of total 10000000


Iteration:   7%|▋         | 696/10688 [09:59<2:13:24,  1.25it/s]

timestamp: 22/06/2020 11:41:42, average loss: 1.819228, time duration: 85.280645,
                            number of examples in current reporting: 800, step 86200
                            out of total 10000000


Iteration:   7%|▋         | 796/10688 [11:24<2:12:44,  1.24it/s]

timestamp: 22/06/2020 11:43:07, average loss: 1.830179, time duration: 85.288141,
                            number of examples in current reporting: 800, step 86300
                            out of total 10000000


Iteration:   8%|▊         | 896/10688 [12:50<2:20:14,  1.16it/s]

timestamp: 22/06/2020 11:44:33, average loss: 1.847515, time duration: 85.916367,
                            number of examples in current reporting: 800, step 86400
                            out of total 10000000


Iteration:   9%|▉         | 996/10688 [14:16<2:18:02,  1.17it/s]

timestamp: 22/06/2020 11:45:59, average loss: 1.848205, time duration: 86.095921,
                            number of examples in current reporting: 800, step 86500
                            out of total 10000000


Iteration:  10%|█         | 1096/10688 [15:47<2:16:22,  1.17it/s]

timestamp: 22/06/2020 11:47:30, average loss: 1.920951, time duration: 90.198676,
                            number of examples in current reporting: 800, step 86600
                            out of total 10000000


Iteration:  11%|█         | 1196/10688 [17:15<2:25:52,  1.08it/s]

timestamp: 22/06/2020 11:48:58, average loss: 1.809042, time duration: 88.207535,
                            number of examples in current reporting: 800, step 86700
                            out of total 10000000


Iteration:  12%|█▏        | 1296/10688 [18:44<2:16:16,  1.15it/s]

timestamp: 22/06/2020 11:50:27, average loss: 1.892757, time duration: 89.188712,
                            number of examples in current reporting: 800, step 86800
                            out of total 10000000


Iteration:  13%|█▎        | 1396/10688 [20:12<2:20:07,  1.11it/s]

timestamp: 22/06/2020 11:51:55, average loss: 1.829745, time duration: 87.688329,
                            number of examples in current reporting: 800, step 86900
                            out of total 10000000


Iteration:  14%|█▍        | 1495/10688 [21:38<2:04:06,  1.23it/s]

timestamp: 22/06/2020 11:53:22, average loss: 1.820100, time duration: 87.071010,
                            number of examples in current reporting: 800, step 87000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  15%|█▍        | 1596/10688 [23:05<2:06:04,  1.20it/s]

timestamp: 22/06/2020 11:54:48, average loss: 1.849293, time duration: 86.588262,
                            number of examples in current reporting: 800, step 87100
                            out of total 10000000


Iteration:  16%|█▌        | 1696/10688 [24:31<2:06:13,  1.19it/s]

timestamp: 22/06/2020 11:56:14, average loss: 1.878055, time duration: 85.266041,
                            number of examples in current reporting: 800, step 87200
                            out of total 10000000


Iteration:  17%|█▋        | 1796/10688 [25:56<2:08:19,  1.15it/s]

timestamp: 22/06/2020 11:57:39, average loss: 1.881783, time duration: 84.986584,
                            number of examples in current reporting: 800, step 87300
                            out of total 10000000


Iteration:  18%|█▊        | 1896/10688 [27:20<2:11:33,  1.11it/s]

timestamp: 22/06/2020 11:59:04, average loss: 1.815030, time duration: 84.982205,
                            number of examples in current reporting: 800, step 87400
                            out of total 10000000


Iteration:  19%|█▊        | 1996/10688 [28:45<1:52:30,  1.29it/s]

timestamp: 22/06/2020 12:00:28, average loss: 1.871012, time duration: 84.203705,
                            number of examples in current reporting: 800, step 87500
                            out of total 10000000


Iteration:  20%|█▉        | 2096/10688 [30:08<2:05:12,  1.14it/s]

timestamp: 22/06/2020 12:01:51, average loss: 1.815330, time duration: 83.591838,
                            number of examples in current reporting: 800, step 87600
                            out of total 10000000


Iteration:  21%|██        | 2196/10688 [31:32<2:07:46,  1.11it/s]

timestamp: 22/06/2020 12:03:16, average loss: 1.847119, time duration: 84.179424,
                            number of examples in current reporting: 800, step 87700
                            out of total 10000000


Iteration:  21%|██▏       | 2296/10688 [32:58<1:57:58,  1.19it/s]

timestamp: 22/06/2020 12:04:41, average loss: 1.848440, time duration: 85.619356,
                            number of examples in current reporting: 800, step 87800
                            out of total 10000000


Iteration:  22%|██▏       | 2396/10688 [34:25<2:01:17,  1.14it/s]

timestamp: 22/06/2020 12:06:08, average loss: 1.834187, time duration: 86.470414,
                            number of examples in current reporting: 800, step 87900
                            out of total 10000000


Iteration:  23%|██▎       | 2495/10688 [35:48<1:52:41,  1.21it/s]

timestamp: 22/06/2020 12:07:32, average loss: 1.812467, time duration: 84.205894,
                            number of examples in current reporting: 800, step 88000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  24%|██▍       | 2596/10688 [37:17<1:54:17,  1.18it/s]

timestamp: 22/06/2020 12:09:00, average loss: 1.875175, time duration: 88.353470,
                            number of examples in current reporting: 800, step 88100
                            out of total 10000000


Iteration:  25%|██▌       | 2696/10688 [38:42<1:49:45,  1.21it/s]

timestamp: 22/06/2020 12:10:25, average loss: 1.822974, time duration: 84.572313,
                            number of examples in current reporting: 800, step 88200
                            out of total 10000000


Iteration:  26%|██▌       | 2796/10688 [40:05<1:44:34,  1.26it/s]

timestamp: 22/06/2020 12:11:48, average loss: 1.823116, time duration: 83.337834,
                            number of examples in current reporting: 800, step 88300
                            out of total 10000000


Iteration:  27%|██▋       | 2896/10688 [41:30<1:48:26,  1.20it/s]

timestamp: 22/06/2020 12:13:13, average loss: 1.895233, time duration: 85.348571,
                            number of examples in current reporting: 800, step 88400
                            out of total 10000000


Iteration:  28%|██▊       | 2996/10688 [42:55<1:50:36,  1.16it/s]

timestamp: 22/06/2020 12:14:38, average loss: 1.792178, time duration: 84.233022,
                            number of examples in current reporting: 800, step 88500
                            out of total 10000000


Iteration:  29%|██▉       | 3096/10688 [44:19<1:47:58,  1.17it/s]

timestamp: 22/06/2020 12:16:02, average loss: 1.887551, time duration: 84.782866,
                            number of examples in current reporting: 800, step 88600
                            out of total 10000000


Iteration:  30%|██▉       | 3196/10688 [45:45<1:38:59,  1.26it/s]

timestamp: 22/06/2020 12:17:28, average loss: 1.908377, time duration: 85.407243,
                            number of examples in current reporting: 800, step 88700
                            out of total 10000000


Iteration:  31%|███       | 3296/10688 [47:10<1:44:40,  1.18it/s]

timestamp: 22/06/2020 12:18:53, average loss: 1.882865, time duration: 85.473613,
                            number of examples in current reporting: 800, step 88800
                            out of total 10000000


Iteration:  32%|███▏      | 3396/10688 [48:37<1:39:51,  1.22it/s]

timestamp: 22/06/2020 12:20:20, average loss: 1.901223, time duration: 86.475805,
                            number of examples in current reporting: 800, step 88900
                            out of total 10000000


Iteration:  33%|███▎      | 3495/10688 [50:00<1:40:57,  1.19it/s]

timestamp: 22/06/2020 12:21:44, average loss: 1.875359, time duration: 84.002522,
                            number of examples in current reporting: 800, step 89000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  34%|███▎      | 3596/10688 [51:29<1:46:10,  1.11it/s]

timestamp: 22/06/2020 12:23:12, average loss: 1.884490, time duration: 88.337252,
                            number of examples in current reporting: 800, step 89100
                            out of total 10000000


Iteration:  35%|███▍      | 3696/10688 [52:54<1:33:19,  1.25it/s]

timestamp: 22/06/2020 12:24:37, average loss: 1.868076, time duration: 84.519885,
                            number of examples in current reporting: 800, step 89200
                            out of total 10000000


Iteration:  36%|███▌      | 3796/10688 [54:18<1:36:21,  1.19it/s]

timestamp: 22/06/2020 12:26:01, average loss: 1.918133, time duration: 84.663270,
                            number of examples in current reporting: 800, step 89300
                            out of total 10000000


Iteration:  36%|███▋      | 3896/10688 [55:43<1:33:20,  1.21it/s]

timestamp: 22/06/2020 12:27:26, average loss: 1.828941, time duration: 84.302500,
                            number of examples in current reporting: 800, step 89400
                            out of total 10000000


Iteration:  37%|███▋      | 3996/10688 [57:09<1:32:42,  1.20it/s]

timestamp: 22/06/2020 12:28:52, average loss: 1.966231, time duration: 86.031051,
                            number of examples in current reporting: 800, step 89500
                            out of total 10000000


Iteration:  38%|███▊      | 4096/10688 [58:33<1:33:54,  1.17it/s]

timestamp: 22/06/2020 12:30:16, average loss: 1.876199, time duration: 84.233297,
                            number of examples in current reporting: 800, step 89600
                            out of total 10000000


Iteration:  39%|███▉      | 4196/10688 [59:57<1:38:05,  1.10it/s]

timestamp: 22/06/2020 12:31:40, average loss: 1.826642, time duration: 84.235000,
                            number of examples in current reporting: 800, step 89700
                            out of total 10000000


Iteration:  40%|████      | 4296/10688 [1:01:21<1:24:15,  1.26it/s]

timestamp: 22/06/2020 12:33:04, average loss: 1.861686, time duration: 84.110453,
                            number of examples in current reporting: 800, step 89800
                            out of total 10000000


Iteration:  41%|████      | 4396/10688 [1:02:46<1:25:44,  1.22it/s]

timestamp: 22/06/2020 12:34:29, average loss: 1.866617, time duration: 84.700613,
                            number of examples in current reporting: 800, step 89900
                            out of total 10000000


Iteration:  42%|████▏     | 4495/10688 [1:04:09<1:22:58,  1.24it/s]

timestamp: 22/06/2020 12:35:53, average loss: 1.831598, time duration: 83.718451,
                            number of examples in current reporting: 800, step 90000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  43%|████▎     | 4596/10688 [1:05:37<1:27:26,  1.16it/s]

timestamp: 22/06/2020 12:37:20, average loss: 1.891030, time duration: 87.798107,
                            number of examples in current reporting: 800, step 90100
                            out of total 10000000


Iteration:  44%|████▍     | 4696/10688 [1:07:01<1:18:15,  1.28it/s]

timestamp: 22/06/2020 12:38:44, average loss: 1.796133, time duration: 83.673025,
                            number of examples in current reporting: 800, step 90200
                            out of total 10000000


Iteration:  45%|████▍     | 4796/10688 [1:08:26<1:26:14,  1.14it/s]

timestamp: 22/06/2020 12:40:09, average loss: 1.925574, time duration: 84.729828,
                            number of examples in current reporting: 800, step 90300
                            out of total 10000000


Iteration:  46%|████▌     | 4896/10688 [1:09:50<1:27:08,  1.11it/s]

timestamp: 22/06/2020 12:41:34, average loss: 1.893016, time duration: 84.654361,
                            number of examples in current reporting: 800, step 90400
                            out of total 10000000


Iteration:  47%|████▋     | 4996/10688 [1:11:15<1:17:42,  1.22it/s]

timestamp: 22/06/2020 12:42:58, average loss: 1.877994, time duration: 84.331953,
                            number of examples in current reporting: 800, step 90500
                            out of total 10000000


Iteration:  48%|████▊     | 5096/10688 [1:12:40<1:16:09,  1.22it/s]

timestamp: 22/06/2020 12:44:23, average loss: 1.876684, time duration: 84.748987,
                            number of examples in current reporting: 800, step 90600
                            out of total 10000000


Iteration:  49%|████▊     | 5196/10688 [1:14:05<1:19:58,  1.14it/s]

timestamp: 22/06/2020 12:45:48, average loss: 1.877601, time duration: 85.654710,
                            number of examples in current reporting: 800, step 90700
                            out of total 10000000


Iteration:  50%|████▉     | 5296/10688 [1:15:30<1:18:43,  1.14it/s]

timestamp: 22/06/2020 12:47:13, average loss: 1.921636, time duration: 85.029142,
                            number of examples in current reporting: 800, step 90800
                            out of total 10000000


Iteration:  50%|█████     | 5396/10688 [1:16:55<1:12:10,  1.22it/s]

timestamp: 22/06/2020 12:48:38, average loss: 1.886036, time duration: 84.947165,
                            number of examples in current reporting: 800, step 90900
                            out of total 10000000


Iteration:  51%|█████▏    | 5495/10688 [1:18:19<1:11:36,  1.21it/s]

timestamp: 22/06/2020 12:50:04, average loss: 1.853087, time duration: 85.380878,
                            number of examples in current reporting: 800, step 91000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  52%|█████▏    | 5596/10688 [1:19:49<1:08:39,  1.24it/s]

timestamp: 22/06/2020 12:51:32, average loss: 1.866659, time duration: 88.250942,
                            number of examples in current reporting: 800, step 91100
                            out of total 10000000


Iteration:  53%|█████▎    | 5696/10688 [1:21:14<1:09:48,  1.19it/s]

timestamp: 22/06/2020 12:52:57, average loss: 1.881193, time duration: 85.059204,
                            number of examples in current reporting: 800, step 91200
                            out of total 10000000


Iteration:  54%|█████▍    | 5796/10688 [1:22:38<1:11:56,  1.13it/s]

timestamp: 22/06/2020 12:54:21, average loss: 1.837919, time duration: 84.392172,
                            number of examples in current reporting: 800, step 91300
                            out of total 10000000


Iteration:  55%|█████▌    | 5896/10688 [1:24:04<1:07:24,  1.18it/s]

timestamp: 22/06/2020 12:55:47, average loss: 1.871101, time duration: 85.446884,
                            number of examples in current reporting: 800, step 91400
                            out of total 10000000


Iteration:  56%|█████▌    | 5996/10688 [1:25:29<1:05:19,  1.20it/s]

timestamp: 22/06/2020 12:57:12, average loss: 1.836164, time duration: 85.550561,
                            number of examples in current reporting: 800, step 91500
                            out of total 10000000


Iteration:  57%|█████▋    | 6096/10688 [1:26:54<1:06:47,  1.15it/s]

timestamp: 22/06/2020 12:58:37, average loss: 1.872014, time duration: 84.551301,
                            number of examples in current reporting: 800, step 91600
                            out of total 10000000


Iteration:  58%|█████▊    | 6196/10688 [1:28:20<1:03:28,  1.18it/s]

timestamp: 22/06/2020 13:00:03, average loss: 1.933673, time duration: 86.305636,
                            number of examples in current reporting: 800, step 91700
                            out of total 10000000


Iteration:  59%|█████▉    | 6296/10688 [1:29:46<1:04:20,  1.14it/s]

timestamp: 22/06/2020 13:01:29, average loss: 1.889679, time duration: 85.995754,
                            number of examples in current reporting: 800, step 91800
                            out of total 10000000


Iteration:  60%|█████▉    | 6396/10688 [1:31:14<1:01:49,  1.16it/s]

timestamp: 22/06/2020 13:02:57, average loss: 1.899564, time duration: 87.836308,
                            number of examples in current reporting: 800, step 91900
                            out of total 10000000


Iteration:  61%|██████    | 6495/10688 [1:32:37<1:01:27,  1.14it/s]

timestamp: 22/06/2020 13:04:21, average loss: 1.879988, time duration: 84.335712,
                            number of examples in current reporting: 800, step 92000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  62%|██████▏   | 6596/10688 [1:34:06<1:02:26,  1.09it/s]

timestamp: 22/06/2020 13:05:49, average loss: 1.841534, time duration: 87.814118,
                            number of examples in current reporting: 800, step 92100
                            out of total 10000000


Iteration:  63%|██████▎   | 6696/10688 [1:35:30<56:48,  1.17it/s]  

timestamp: 22/06/2020 13:07:13, average loss: 1.843665, time duration: 83.633625,
                            number of examples in current reporting: 800, step 92200
                            out of total 10000000


Iteration:  64%|██████▎   | 6796/10688 [1:36:56<56:30,  1.15it/s]  

timestamp: 22/06/2020 13:08:39, average loss: 1.857987, time duration: 86.090294,
                            number of examples in current reporting: 800, step 92300
                            out of total 10000000


Iteration:  65%|██████▍   | 6896/10688 [1:38:23<52:29,  1.20it/s]  

timestamp: 22/06/2020 13:10:06, average loss: 1.841798, time duration: 86.943443,
                            number of examples in current reporting: 800, step 92400
                            out of total 10000000


Iteration:  65%|██████▌   | 6996/10688 [1:39:48<57:17,  1.07it/s]

timestamp: 22/06/2020 13:11:31, average loss: 1.920071, time duration: 85.648241,
                            number of examples in current reporting: 800, step 92500
                            out of total 10000000


Iteration:  66%|██████▋   | 7096/10688 [1:41:13<51:21,  1.17it/s]

timestamp: 22/06/2020 13:12:57, average loss: 1.830102, time duration: 85.071030,
                            number of examples in current reporting: 800, step 92600
                            out of total 10000000


Iteration:  67%|██████▋   | 7196/10688 [1:42:38<51:42,  1.13it/s]

timestamp: 22/06/2020 13:14:22, average loss: 1.898001, time duration: 84.996490,
                            number of examples in current reporting: 800, step 92700
                            out of total 10000000


Iteration:  68%|██████▊   | 7296/10688 [1:44:03<47:00,  1.20it/s]

timestamp: 22/06/2020 13:15:46, average loss: 1.872704, time duration: 84.121661,
                            number of examples in current reporting: 800, step 92800
                            out of total 10000000


Iteration:  69%|██████▉   | 7396/10688 [1:45:27<45:00,  1.22it/s]

timestamp: 22/06/2020 13:17:10, average loss: 1.825909, time duration: 84.121335,
                            number of examples in current reporting: 800, step 92900
                            out of total 10000000


Iteration:  70%|███████   | 7495/10688 [1:46:51<43:28,  1.22it/s]

timestamp: 22/06/2020 13:18:34, average loss: 1.881541, time duration: 84.724596,
                            number of examples in current reporting: 800, step 93000
                            out of total 10000000
saving through pytorch to /home/ge75zam2/finetuning/fine_tuned/bertsumabs.pt


Iteration:  71%|███████   | 7596/10688 [1:48:20<44:25,  1.16it/s]  

timestamp: 22/06/2020 13:20:03, average loss: 1.835755, time duration: 88.196064,
                            number of examples in current reporting: 800, step 93100
                            out of total 10000000


Iteration:  72%|███████▏  | 7696/10688 [1:49:44<41:38,  1.20it/s]

timestamp: 22/06/2020 13:21:27, average loss: 1.862513, time duration: 84.449970,
                            number of examples in current reporting: 800, step 93200
                            out of total 10000000


Iteration:  73%|███████▎  | 7796/10688 [1:51:08<42:19,  1.14it/s]

timestamp: 22/06/2020 13:22:52, average loss: 1.864685, time duration: 84.384364,
                            number of examples in current reporting: 800, step 93300
                            out of total 10000000


Iteration:  74%|███████▍  | 7896/10688 [1:52:32<37:09,  1.25it/s]

timestamp: 22/06/2020 13:24:15, average loss: 1.816142, time duration: 83.536103,
                            number of examples in current reporting: 800, step 93400
                            out of total 10000000


Iteration:  75%|███████▍  | 7996/10688 [1:53:56<38:38,  1.16it/s]

timestamp: 22/06/2020 13:25:40, average loss: 1.871687, time duration: 84.466656,
                            number of examples in current reporting: 800, step 93500
                            out of total 10000000


Iteration:  76%|███████▌  | 8096/10688 [1:55:20<35:36,  1.21it/s]

timestamp: 22/06/2020 13:27:03, average loss: 1.855836, time duration: 83.948067,
                            number of examples in current reporting: 800, step 93600
                            out of total 10000000


Iteration:  77%|███████▋  | 8196/10688 [1:56:46<39:16,  1.06it/s]

timestamp: 22/06/2020 13:28:29, average loss: 1.858310, time duration: 85.463215,
                            number of examples in current reporting: 800, step 93700
                            out of total 10000000


Iteration:  78%|███████▊  | 8296/10688 [1:58:12<33:50,  1.18it/s]

timestamp: 22/06/2020 13:29:55, average loss: 1.867685, time duration: 85.990846,
                            number of examples in current reporting: 800, step 93800
                            out of total 10000000


Iteration:  79%|███████▊  | 8396/10688 [1:59:39<32:11,  1.19it/s]

timestamp: 22/06/2020 13:31:22, average loss: 1.896232, time duration: 86.633912,
                            number of examples in current reporting: 800, step 93900
                            out of total 10000000


Iteration:  79%|███████▉  | 8419/10688 [1:59:58<31:01,  1.22it/s]

In [None]:
summarizer.save_model(MAX_STEPS, os.path.join("/Users/gjke/Documents/uni/faktual", "bertsumabs.pt"))

## Model Evaluation

To run rouge evaluation, please refer to the section of compute_rouge_perl in [summarization_evaluation.ipynb](summarization_evaluation.ipynb) for setup.
For the settings in this notebook with QUICK_RUN=False, you should get ROUGE scores close to the following numbers: <br />
``
{'rouge-1': {'f': 0.34819639878321873,
             'p': 0.39977932634737307,
             'r': 0.34429079596863604},
 'rouge-2': {'f': 0.13919271352557894,
             'p': 0.16129965067780644,
             'r': 0.1372938054050938},
 'rouge-l': {'f': 0.2313282318854973,
             'p': 0.26664667422849747,
             'r': 0.22850294283399628}}
 ``
 
 Better performance can be achieved by increasing the MAX_STEPS.

In [None]:

checkpoint = torch.load(os.path.join("/Users/gjke/Documents/uni/faktual", "bertsumabs.pt"), map_location="cpu")
summarizer = BertSumAbs(
    processor, cache_dir=CACHE_PATH, max_pos_length=MAX_POS, test=True
)
summarizer.model.load_checkpoint(checkpoint['model'])

In [None]:
len(test_dataset)

In [None]:
TEST_TOP_N = 10
if not QUICK_RUN:
    TEST_TOP_N = len(test_dataset)
TEST_TOP_N = 10
if NUM_GPUS:
    BATCH_SIZE = NUM_GPUS * BATCH_SIZE_PER_GPU
else:
    BATCH_SIZE = 1
    
shortened_dataset = test_dataset.shorten(top_n=TEST_TOP_N)
src = shortened_dataset.get_source()
reference_summaries = [" ".join(t).rstrip("\n") for t in shortened_dataset.get_target()]
generated_summaries = summarizer.predict(
    shortened_dataset, batch_size=BATCH_SIZE, num_gpus=NUM_GPUS
)
assert len(generated_summaries) == len(reference_summaries)

In [None]:
shortened_dataset.get_source()[3]

In [None]:
generated_summaries[3]

## Clean up temporary folders

In [None]:
if os.path.exists(CACHE_PATH):
    shutil.rmtree(CACHE_PATH, ignore_errors=True)