## ⚡️ How to train deberta to reach LB: 0.966
This is the training scriped I used to train the model used in my notebook ["DeBERTa-v3 Single Model LB:0.966"](https://www.kaggle.com/code/emiz6413/deberta-v3-single-model-lb-0-966)

#### Key points are as follows:
1. The (original) training data is splitted to 4 folds according to their `document % 4`

2. [MPWARE's dataset](https://www.kaggle.com/datasets/mpware/pii-mixtral8x7b-generated-essays) is added to each training set
I've tried different external datasets (though not very extensively) and found this works the best

3. The first 6 layers in the encoder are frozen while the embedding layer is trainable<br>

4. Training & evaluation dataset is truncated with `MAX_LENGTH=3072` <br>
My previous [notebook](https://www.kaggle.com/code/emiz6413/945-947-deberta-v3-base-infer-truncation-false) found longer input sequence inference performs better than striding inference, which made me think longer input sequence during training also improves the performance.

5. Evaluation is done every 50 steps and computes overall f5/recall/precision and entity-wise f5/recall/precision based on spacy token level prediction and ground-truth. The best checkpoint based on eval/f5 in each fold is selected as the final model

#### Note
I found by coincidence that thresholding on logits instead of softmax of logits can find better models on LB even though inference is done with thresholding on softmax. Please refer `MetricsComputer` for implementation. I haven't figured out why this is the case, so please let me know your thoughts in the comment.

In [1]:
import json
import copy
import gc
import os
import re
from collections import defaultdict
from pathlib import Path

import torch
from torch import nn
import numpy as np
import pandas as pd
from spacy.lang.en import English
from transformers.tokenization_utils import PreTrainedTokenizerBase
from transformers.models.deberta_v2 import DebertaV2ForTokenClassification, DebertaV2TokenizerFast
from transformers.trainer import Trainer
from transformers.training_args import TrainingArguments
from transformers.trainer_utils import EvalPrediction
from transformers.data.data_collator import DataCollatorForTokenClassification
from datasets import Dataset, DatasetDict, concatenate_datasets
import wandb

2024-03-31 22:48:09.573265: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-31 22:48:09.602609: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-31 22:48:10.577156: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-31 22:48:10.580719: I tensorflow/comp

## Config & Parameters

In [2]:
TRAINING_MODEL_PATH = "microsoft/deberta-v3-large"
TRAINING_MAX_LENGTH = 3072
EVAL_MAX_LENGTH = 3072
CONF_THRESH = 0.9
LR = 2.5e-5
LR_SCHEDULER_TYPE = "linear"
NUM_EPOCHS = 3
BATCH_SIZE = 1
EVAL_BATCH_SIZE = 1
GRAD_ACCUMULATION_STEPS = 16 // BATCH_SIZE
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.01
AMP = True
FREEZE_EMBEDDING = False
FREEZE_LAYERS = 6
N_SPLITS = 4
NEGATIVE_RATIO = 0.3  # down sample ratio of negative samples in the training set
OUTPUT_DIR = "output"
Path(OUTPUT_DIR).mkdir(exist_ok=True)

In [3]:
args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    fp16=AMP,
    learning_rate=LR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUMULATION_STEPS,
    gradient_checkpointing=True,
    report_to="none",
    evaluation_strategy="steps",
    eval_steps=50,
    eval_delay=100,
    save_strategy="steps",
    save_steps=50,
    save_total_limit=1,
    logging_steps=10,
    metric_for_best_model="f5",
    greater_is_better=True,
    load_best_model_at_end=True,
    overwrite_output_dir=True,
    lr_scheduler_type=LR_SCHEDULER_TYPE,
    warmup_ratio=WARMUP_RATIO,
    weight_decay=WEIGHT_DECAY,
)

## Dataset Preparation

In [4]:
with Path("input/pii-detection-removal-from-educational-data/train.json").open("r") as f:
    original_data = json.load(f)

with Path("input/pii-mixtral8x7b-generated-essays/mpware_mixtral8x7b_v1.1-no-i-username.json").open("r") as f:
    extra_data = json.load(f)
print("MPWARE's datapoints: ", len(extra_data))

MPWARE's datapoints:  2692


In [5]:
all_labels = [
    'B-EMAIL', 'B-ID_NUM', 'B-NAME_STUDENT', 'B-PHONE_NUM', 'B-STREET_ADDRESS', 'B-URL_PERSONAL', 'B-USERNAME', 'I-ID_NUM', 'I-NAME_STUDENT', 'I-PHONE_NUM', 'I-STREET_ADDRESS', 'I-URL_PERSONAL', 'O'
]
id2label = {i: l for i, l in enumerate(all_labels)}
label2id = {v: k for k, v in id2label.items()}
target = [l for l in all_labels if l != "O"]

## Tokenization

In [6]:
class CustomTokenizer:
    def __init__(self, tokenizer: PreTrainedTokenizerBase, label2id: dict, max_length: int) -> None:
        self.tokenizer = tokenizer
        self.label2id = label2id
        self.max_length = max_length

    def __call__(self, example: dict) -> dict:
        # rebuild text from tokens
        text, labels, token_map = [], [], []

        for idx, (t, l, ws) in enumerate(
            zip(example["tokens"], example["provided_labels"], example["trailing_whitespace"])
        ):
            text.append(t)
            labels.extend([l] * len(t))
            token_map.extend([idx]*len(t))

            if ws:
                text.append(" ")
                labels.append("O")
                token_map.append(-1)

        text = "".join(text)
        labels = np.array(labels)

        # actual tokenization
        tokenized = self.tokenizer(
            "".join(text),
            return_offsets_mapping=True,
            truncation=True,
            max_length=self.max_length
        )

        token_labels = []

        for start_idx, end_idx in tokenized.offset_mapping:
            # CLS token
            if start_idx == 0 and end_idx == 0:
                token_labels.append(self.label2id["O"])
                continue

            # case when token starts with whitespace
            if text[start_idx].isspace():
                start_idx += 1

            token_labels.append(self.label2id[labels[start_idx]])

        length = len(tokenized.input_ids)

        return {**tokenized, "labels": token_labels, "length": length, "token_map": token_map}

## Instantiate the dataset

In [7]:
tokenizer = DebertaV2TokenizerFast.from_pretrained(TRAINING_MODEL_PATH)
train_encoder = CustomTokenizer(tokenizer=tokenizer, label2id=label2id, max_length=TRAINING_MAX_LENGTH)
eval_encoder = CustomTokenizer(tokenizer=tokenizer, label2id=label2id, max_length=EVAL_MAX_LENGTH)

ds = DatasetDict()

for key, data in zip(["original", "extra"], [original_data, extra_data]):
    ds[key] = Dataset.from_dict({
        "full_text": [x["full_text"] for x in data],
        "document": [str(x["document"]) for x in data],
        "tokens": [x["tokens"] for x in data],
        "trailing_whitespace": [x["trailing_whitespace"] for x in data],
        "provided_labels": [x["labels"] for x in data],
    })



## Metrics

In [8]:
def find_span(target: list[str], document: list[str]) -> list[list[int]]:
    idx = 0
    spans = []
    span = []

    for i, token in enumerate(document):
        if token != target[idx]:
            idx = 0
            span = []
            continue
        span.append(i)
        idx += 1
        if idx == len(target):
            spans.append(span)
            span = []
            idx = 0
            continue

    return spans


class PRFScore:
    """A precision / recall / F score."""

    def __init__(
        self,
        *,
        tp: int = 0,
        fp: int = 0,
        fn: int = 0,
    ) -> None:
        self.tp = tp
        self.fp = fp
        self.fn = fn

    def __len__(self) -> int:
        return self.tp + self.fp + self.fn

    def __iadd__(self, other):  # in-place add
        self.tp += other.tp
        self.fp += other.fp
        self.fn += other.fn
        return self

    def __add__(self, other):
        return PRFScore(
            tp=self.tp + other.tp, fp=self.fp + other.fp, fn=self.fn + other.fn
        )

    def score_set(self, cand: set, gold: set) -> None:
        self.tp += len(cand.intersection(gold))
        self.fp += len(cand - gold)
        self.fn += len(gold - cand)

    @property
    def precision(self) -> float:
        return self.tp / (self.tp + self.fp + 1e-100)

    @property
    def recall(self) -> float:
        return self.tp / (self.tp + self.fn + 1e-100)

    @property
    def f1(self) -> float:
        p = self.precision
        r = self.recall
        return 2 * ((p * r) / (p + r + 1e-100))

    @property
    def f5(self) -> float:
        beta = 5
        p = self.precision
        r = self.recall

        fbeta = (1+(beta**2))*p*r / ((beta**2)*p + r + 1e-100)
        return fbeta

    def to_dict(self) -> dict[str, float]:
        return {"p": self.precision, "r": self.recall, "f5": self.f5}

In [9]:
class MetricsComputer:
    nlp = English()

    def __init__(self, eval_ds: Dataset, label2id: dict, conf_thresh: float = 0.9) -> None:
        self.ds = eval_ds.remove_columns("labels").rename_columns({"provided_labels": "labels"})
        self.gt_df = self.create_gt_df(self.ds)
        self.label2id = label2id
        self.confth = conf_thresh
        self._search_gt()

    def __call__(self, eval_preds: EvalPrediction) -> dict:
        pred_df = self.create_pred_df(eval_preds.predictions)
        return self.compute_metrics_from_df(self.gt_df, pred_df)

    def _search_gt(self) -> None:
        email_regex = re.compile(r'[\w.+-]+@[\w-]+\.[\w.-]+')
        phone_num_regex = re.compile(r"(\(\d{3}\)\d{3}\-\d{4}\w*|\d{3}\.\d{3}\.\d{4})\s")
        self.emails = []
        self.phone_nums = []

        for _data in self.ds:
            # email
            for token_idx, token in enumerate(_data["tokens"]):
                if re.fullmatch(email_regex, token) is not None:
                    self.emails.append(
                        {"document": _data["document"], "token": token_idx, "label": "B-EMAIL", "token_str": token}
                    )
            # phone number
            matches = phone_num_regex.findall(_data["full_text"])
            if not matches:
                continue
            for match in matches:
                target = [t.text for t in self.nlp.tokenizer(match)]
                matched_spans = find_span(target, _data["tokens"])
            for matched_span in matched_spans:
                for intermediate, token_idx in enumerate(matched_span):
                    prefix = "I" if intermediate else "B"
                    self.phone_nums.append(
                        {"document": _data["document"], "token": token_idx, "label": f"{prefix}-PHONE_NUM", "token_str": _data["tokens"][token_idx]}
                    )

    @staticmethod
    def create_gt_df(ds: Dataset):
        gt = []
        for row in ds:
            for token_idx, (token, label) in enumerate(zip(row["tokens"], row["labels"])):
                if label == "O":
                    continue
                gt.append(
                    {"document": row["document"], "token": token_idx, "label": label, "token_str": token}
                )
        gt_df = pd.DataFrame(gt)
        gt_df["row_id"] = gt_df.index

        return gt_df

    def create_pred_df(self, logits: np.ndarray) -> pd.DataFrame:
        """
        Note:
            Thresholing is doen on logits instead of softmax, which could find better models on LB.
        """
        prediction = logits
        o_index = self.label2id["O"]
        preds = prediction.argmax(-1)
        preds_without_o = prediction.copy()
        preds_without_o[:,:,o_index] = 0
        preds_without_o = preds_without_o.argmax(-1)
        o_preds = prediction[:,:,o_index]
        preds_final = np.where(o_preds < self.confth, preds_without_o , preds)

        pairs = set()
        processed = []

        # Iterate over document
        for p_doc, token_map, offsets, tokens, doc in zip(
            preds_final, self.ds["token_map"], self.ds["offset_mapping"], self.ds["tokens"], self.ds["document"]
        ):
            # Iterate over sequence
            for p_token, (start_idx, end_idx) in zip(p_doc, offsets):
                label_pred = id2label[p_token]

                if start_idx + end_idx == 0:
                    # [CLS] token i.e. BOS
                    continue

                if token_map[start_idx] == -1:
                    start_idx += 1

                # ignore "\n\n"
                while start_idx < len(token_map) and tokens[token_map[start_idx]].isspace():
                    start_idx += 1

                if start_idx >= len(token_map):
                    break

                token_id = token_map[start_idx]
                pair = (doc, token_id)

                # ignore "O", preds, phone number and  email
                if label_pred in ("O", "B-EMAIL", "B-PHONE_NUM", "I-PHONE_NUM") or token_id == -1:
                    continue

                if pair in pairs:
                    continue

                processed.append(
                    {"document": doc, "token": token_id, "label": label_pred, "token_str": tokens[token_id]}
                )
                pairs.add(pair)

        pred_df = pd.DataFrame(processed + self.emails + self.phone_nums)
        pred_df["row_id"] = list(range(len(pred_df)))

        return pred_df

    def compute_metrics_from_df(self, gt_df, pred_df):
        """
        Compute the LB metric (lb) and other auxiliary metrics
        """

        references = {(row.document, row.token, row.label) for row in gt_df.itertuples()}
        predictions = {(row.document, row.token, row.label) for row in pred_df.itertuples()}

        score_per_type = defaultdict(PRFScore)
        references = set(references)

        for ex in predictions:
            pred_type = ex[-1] # (document, token, label)
            if pred_type != 'O':
                pred_type = pred_type[2:] # avoid B- and I- prefix

            if pred_type not in score_per_type:
                score_per_type[pred_type] = PRFScore()

            if ex in references:
                score_per_type[pred_type].tp += 1
                references.remove(ex)
            else:
                score_per_type[pred_type].fp += 1

        for doc, tok, ref_type in references:
            if ref_type != 'O':
                ref_type = ref_type[2:] # avoid B- and I- prefix

            if ref_type not in score_per_type:
                score_per_type[ref_type] = PRFScore()
            score_per_type[ref_type].fn += 1

        totals = PRFScore()

        for prf in score_per_type.values():
            totals += prf

        return {
            "precision": totals.precision,
            "recall": totals.recall,
            "f5": totals.f5,
            **{
                f"{v_k}-{k}": v_v
                for k in set([l[2:] for l in self.label2id.keys() if l!= 'O'])
                for v_k, v_v in score_per_type[k].to_dict().items()
            },
        }

## Model

In [10]:
class ModelInit:
    def __init__(
        self,
        checkpoint: str,
        id2label: dict,
        label2id: dict,
        freeze_embedding: bool,
        freeze_layers: int,
    ) -> None:
        self.model = DebertaV2ForTokenClassification.from_pretrained(
            checkpoint,
            num_labels=len(id2label),
            id2label=id2label,
            label2id=label2id,
            ignore_mismatched_sizes=True
        )
        for param in self.model.deberta.embeddings.parameters():
            param.requires_grad = False if freeze_embedding else True
        for layer in self.model.deberta.encoder.layer[:freeze_layers]:
            for param in layer.parameters():
                param.requires_grad = False
        self.weight = copy.deepcopy(self.model.state_dict())

    def __call__(self) -> DebertaV2ForTokenClassification:
        self.model.load_state_dict(self.weight)
        return self.model

model_init = ModelInit(
    TRAINING_MODEL_PATH,
    id2label=id2label,
    label2id=label2id,
    freeze_embedding=FREEZE_EMBEDDING,
    freeze_layers=FREEZE_LAYERS,
)

  return self.fget.__get__(instance, owner)()
Some weights of DebertaV2ForTokenClassification were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Split 
Split the original dataset into 4 folds according to `document % 4` <br>
Only uses the first 30% of negative samples in the training set but they are NOT excluded from the eval set to make sure cross-evalidation is done on the entire training dataset.

In [11]:
# split according to document id
folds = [
    (
        np.array([i for i, d in enumerate(ds["original"]["document"]) if int(d) % N_SPLITS != s]),
        np.array([i for i, d in enumerate(ds["original"]["document"]) if int(d) % N_SPLITS == s])
    )
    for s in range(N_SPLITS)
]

negative_idxs = [i for i, labels in enumerate(ds["original"]["provided_labels"]) if not any(np.array(labels) != "O")]
exclude_indices = negative_idxs[int(len(negative_idxs) * NEGATIVE_RATIO):]

## Train
Performs cross-validation and save the best checkpoint's metrics as json.

In [12]:
for fold_idx, (train_idx, eval_idx) in enumerate(folds):
    args.run_name = f"fold-{fold_idx}"
    args.output_dir = os.path.join(OUTPUT_DIR, f"fold_{fold_idx}")
    original_ds = ds["original"].select([i for i in train_idx if i not in exclude_indices])
    train_ds = concatenate_datasets([original_ds, ds["extra"]])
    train_ds = train_ds.map(train_encoder, num_proc=os.cpu_count())
    eval_ds = ds["original"].select(eval_idx)
    eval_ds = eval_ds.map(eval_encoder, num_proc=os.cpu_count())
    trainer = Trainer(
        args=args,
        model_init=model_init,
        train_dataset=train_ds,
        eval_dataset=eval_ds,
        tokenizer=tokenizer,
        compute_metrics=MetricsComputer(eval_ds=eval_ds, label2id=label2id),
        data_collator=DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=16),
    )
    # break # delete this line to reproduce the result.
    trainer.train()
    eval_res = trainer.evaluate(eval_dataset=eval_ds)
    with open(os.path.join(args.output_dir, "eval_result.json"), "w") as f:
        json.dump(eval_res, f)
    del trainer
    gc.collect()
    torch.cuda.empty_cache()

Map (num_proc=12):   0%|          | 0/4704 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/1698 [00:00<?, ? examples/s]

  0%|          | 0/882 [00:00<?, ?it/s]

{'loss': 2.5067, 'learning_rate': 2.5280898876404495e-06, 'epoch': 0.03}
{'loss': 1.5901, 'learning_rate': 5.3370786516853935e-06, 'epoch': 0.07}
{'loss': 0.3077, 'learning_rate': 8.146067415730338e-06, 'epoch': 0.1}
{'loss': 0.0936, 'learning_rate': 1.0955056179775282e-05, 'epoch': 0.14}
{'loss': 0.0525, 'learning_rate': 1.3764044943820225e-05, 'epoch': 0.17}
{'loss': 0.0432, 'learning_rate': 1.657303370786517e-05, 'epoch': 0.2}
{'loss': 0.0273, 'learning_rate': 1.9382022471910114e-05, 'epoch': 0.24}
{'loss': 0.019, 'learning_rate': 2.2191011235955056e-05, 'epoch': 0.27}
{'loss': 0.0132, 'learning_rate': 2.5e-05, 'epoch': 0.31}
{'loss': 0.0088, 'learning_rate': 2.4684741488020177e-05, 'epoch': 0.34}


  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0020956178195774555, 'eval_precision': 0.8696925329428989, 'eval_recall': 0.9027355623100304, 'eval_f5': 0.9014183155314305, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.8928571428571429, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9587020648967552, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.8947368421052632, 'eval_r-URL_PERSONAL': 0.68, 'eval_f5-URL_PERSONAL': 0.6863354037267081, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8745980707395499, 'eval_r-NAME_STUDENT': 0.9096989966555183, 'eval_f5-NAME_STUDENT': 0.9082969432314411, 'eval_runtime': 53.4309, 'eval_samples_per_second': 31.779, 'eval_steps_per_second': 31.779, 'epoch': 0.34}
{'loss': 0.0105, 'learning_rate': 2.4369482976040353e-05, 'epoch': 0.37}
{'loss': 0.0074,

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0010274517117068172, 'eval_precision': 0.8362652232746955, 'eval_recall': 0.939209726443769, 'eval_f5': 0.9347838734074116, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.9615384615384616, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9615384615384616, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.78125, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.989345509893455, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8482549317147192, 'eval_r-NAME_STUDENT': 0.9347826086956522, 'eval_f5-NAME_STUDENT': 0.9311294765840222, 'eval_runtime': 53.1235, 'eval_samples_per_second': 31.963, 'eval_steps_per_second': 31.963, 'epoch': 0.51}
{'loss': 0.0051, 'learning_rate': 2.2793190416141236e-05, 'epoch': 0.54}
{'loss': 0.003, 'learning_rate

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0014148126356303692, 'eval_precision': 0.7747183979974969, 'eval_recall': 0.9407294832826748, 'eval_f5': 0.9330395964983478, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.25, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.896551724137931, 'eval_p-ID_NUM': 0.9615384615384616, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9615384615384616, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.72, 'eval_r-URL_PERSONAL': 0.72, 'eval_f5-URL_PERSONAL': 0.72, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.78099173553719, 'eval_r-NAME_STUDENT': 0.9481605351170569, 'eval_f5-NAME_STUDENT': 0.9404184741005359, 'eval_runtime': 53.4378, 'eval_samples_per_second': 31.775, 'eval_steps_per_second': 31.775, 'epoch': 0.68}
{'loss': 0.0052, 'learning_rate': 2.121689785624212e-05, 'epoch': 0.71}
{'loss': 0.0026, 'learning_rate'

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0015562425833195448, 'eval_precision': 0.7123595505617978, 'eval_recall': 0.9635258358662614, 'eval_f5': 0.9506343713956171, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.6666666666666666, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.9096209912536444, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7352941176470589, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9863429438543246, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7227101631116688, 'eval_r-NAME_STUDENT': 0.9632107023411371, 'eval_f5-NAME_STUDENT': 0.9510382930081921, 'eval_runtime': 52.9727, 'eval_samples_per_second': 32.054, 'eval_steps_per_second': 32.054, 'epoch': 0.85}
{'loss': 0.0038, 'learning_rate': 1.9640605296343e-05, 'epoch': 0.88}
{'loss': 0.0027, 'le

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0011634572874754667, 'eval_precision': 0.7230590961761297, 'eval_recall': 0.9483282674772037, 'eval_f5': 0.9370992895512041, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.9230769230769231, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.923076923076923, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7333333333333333, 'eval_r-URL_PERSONAL': 0.88, 'eval_f5-URL_PERSONAL': 0.8732824427480916, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7266922094508301, 'eval_r-NAME_STUDENT': 0.9515050167224081, 'eval_f5-NAME_STUDENT': 0.9403165321299182, 'eval_runtime': 53.1118, 'eval_samples_per_second': 31.97, 'eval_steps_per_second': 31.97, 'epoch': 1.02}
{'loss': 0.0017, 'learning_rate': 1.8064312736443884e-05, 'epoch': 1.05}
{'lo

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.001113023841753602, 'eval_precision': 0.7858048162230672, 'eval_recall': 0.9422492401215805, 'eval_f5': 0.9350890422878357, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.8275862068965517, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.9189985272459499, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7741935483870968, 'eval_r-URL_PERSONAL': 0.96, 'eval_f5-URL_PERSONAL': 0.9512195121951219, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.798581560283688, 'eval_r-NAME_STUDENT': 0.9414715719063546, 'eval_f5-NAME_STUDENT': 0.9350367294793996, 'eval_runtime': 52.7955, 'eval_samples_per_second': 32.162, 'eval_steps_per_second': 32.162, 'epoch': 1.19}
{'loss': 0.0008, 'learning_rate': 1.6488020176544767e-05, 'epoch': 1.22}
{'l

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0009527240763418376, 'eval_precision': 0.8344370860927153, 'eval_recall': 0.9574468085106383, 'eval_f5': 0.952048823016565, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.8620689655172413, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9572901325478645, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.8461538461538461, 'eval_r-URL_PERSONAL': 0.88, 'eval_f5-URL_PERSONAL': 0.8786482334869432, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8428781204111601, 'eval_r-NAME_STUDENT': 0.959866220735786, 'eval_f5-NAME_STUDENT': 0.9547693685624717, 'eval_runtime': 53.4402, 'eval_samples_per_second': 31.774, 'eval_steps_per_second': 31.774, 'epoch': 1.36}
{'loss': 0.0011, 'learning_rate': 1.491172761664565e-05, 'epoch': 1.39}
{'loss': 0.0008, 'l

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.001018580631352961, 'eval_precision': 0.8119325551232166, 'eval_recall': 0.9513677811550152, 'eval_f5': 0.9451251379130133, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.3333333333333333, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9285714285714286, 'eval_p-ID_NUM': 0.8064516129032258, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9544787077826726, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7575757575757576, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9878419452887538, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8313782991202346, 'eval_r-NAME_STUDENT': 0.9481605351170569, 'eval_f5-NAME_STUDENT': 0.9430655066530195, 'eval_runtime': 53.3289, 'eval_samples_per_second': 31.84, 'eval_steps_per_second': 31.84, 'epoch': 1.53}
{'loss': 0.0011, 'learning_rate': 1.3335435056746534e-05, 'epoc

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.00106418423820287, 'eval_precision': 0.7658536585365854, 'eval_recall': 0.9544072948328267, 'eval_f5': 0.9454545454545454, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.9615384615384616, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9615384615384616, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.72, 'eval_f5-URL_PERSONAL': 0.7278382581648523, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7639257294429708, 'eval_r-NAME_STUDENT': 0.9632107023411371, 'eval_f5-NAME_STUDENT': 0.9536423841059603, 'eval_runtime': 53.1863, 'eval_samples_per_second': 31.926, 'eval_steps_per_second': 31.926, 'epoch': 1.7}
{'loss': 0.0007, 'learning_rate': 1.1759142496847415e-05, 'epoch': 1.73}
{'loss': 0.0007, 'learning_rate': 

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0012362874113023281, 'eval_precision': 0.7086092715231788, 'eval_recall': 0.9756838905775076, 'eval_f5': 0.9617423369439964, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.3333333333333333, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9285714285714286, 'eval_p-ID_NUM': 0.8275862068965517, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.9189985272459499, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7741935483870968, 'eval_r-URL_PERSONAL': 0.96, 'eval_f5-URL_PERSONAL': 0.9512195121951219, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7125456760048721, 'eval_r-NAME_STUDENT': 0.9782608695652174, 'eval_f5-NAME_STUDENT': 0.964428381206011, 'eval_runtime': 52.961, 'eval_samples_per_second': 32.061, 'eval_steps_per_second': 32.061, 'epoch': 1.87}
{'loss': 0.001, 'learning_rate': 1.0182849936948297e-05, 'epo

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0008639008738100529, 'eval_precision': 0.8166449934980494, 'eval_recall': 0.9544072948328267, 'eval_f5': 0.9482548347755386, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.920353982300885, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.8461538461538461, 'eval_r-URL_PERSONAL': 0.88, 'eval_f5-URL_PERSONAL': 0.8786482334869432, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8280346820809249, 'eval_r-NAME_STUDENT': 0.9581939799331104, 'eval_f5-NAME_STUDENT': 0.9524357499041043, 'eval_runtime': 52.8752, 'eval_samples_per_second': 32.113, 'eval_steps_per_second': 32.113, 'epoch': 2.04}
{'loss': 0.0006, 'learning_rate': 8.60655737704918e-06, 'epoch': 2.07}
{'lo

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0008024873095564544, 'eval_precision': 0.8441734417344173, 'eval_recall': 0.9468085106382979, 'eval_f5': 0.9424016755876191, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.9230769230769231, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.923076923076923, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7931034482758621, 'eval_r-URL_PERSONAL': 0.92, 'eval_f5-URL_PERSONAL': 0.9143730886850153, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8590909090909091, 'eval_r-NAME_STUDENT': 0.9481605351170569, 'eval_f5-NAME_STUDENT': 0.9443946188340807, 'eval_runtime': 53.6941, 'eval_samples_per_second': 31.624, 'eval_steps_per_second': 31.624, 'epoch': 2.21}
{'loss': 0.0007, 'learning_rate': 7.030264817150063e-06, 'epoch': 2.24}
{'l

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0010302276350557804, 'eval_precision': 0.7779141104294478, 'eval_recall': 0.9635258358662614, 'eval_f5': 0.9547639733565015, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.920353982300885, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.78125, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.989345509893455, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7879616963064295, 'eval_r-NAME_STUDENT': 0.9632107023411371, 'eval_f5-NAME_STUDENT': 0.9550411325808303, 'eval_runtime': 52.999, 'eval_samples_per_second': 32.038, 'eval_steps_per_second': 32.038, 'epoch': 2.38}
{'loss': 0.0002, 'learning_rate': 5.453972257250946e-06, 'epoch': 2.41}
{'loss': 0.0006, 

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0010808567749336362, 'eval_precision': 0.7791411042944786, 'eval_recall': 0.9650455927051672, 'eval_f5': 0.9562699102229945, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.8888888888888888, 'eval_r-ID_NUM': 0.9230769230769231, 'eval_f5-ID_NUM': 0.9217134416543575, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.78125, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.989345509893455, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7882513661202186, 'eval_r-NAME_STUDENT': 0.9648829431438127, 'eval_f5-NAME_STUDENT': 0.9566381839051141, 'eval_runtime': 52.989, 'eval_samples_per_second': 32.044, 'eval_steps_per_second': 32.044, 'epoch': 2.55}
{'loss': 0.0005, 'learning_rate': 3.877679697351828e-06, 'epoch': 2.59}
{'loss': 0.0003,

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0008767778635956347, 'eval_precision': 0.8148631029986962, 'eval_recall': 0.9498480243161094, 'eval_f5': 0.9438345820990881, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.9259259259259259, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9601181683899558, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.78125, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.989345509893455, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.827485380116959, 'eval_r-NAME_STUDENT': 0.9464882943143813, 'eval_f5-NAME_STUDENT': 0.9412818216707177, 'eval_runtime': 53.4681, 'eval_samples_per_second': 31.757, 'eval_steps_per_second': 31.757, 'epoch': 2.72}
{'loss': 0.0005, 'learning_rate': 2.3013871374527115e-06, 'epoch': 2.76}
{'loss': 0.0003

  0%|          | 0/1698 [00:00<?, ?it/s]

{'eval_loss': 0.0008890742319636047, 'eval_precision': 0.8110539845758354, 'eval_recall': 0.958966565349544, 'eval_f5': 0.9522869746923612, 'eval_p-PHONE_NUM': 0.0, 'eval_r-PHONE_NUM': 0.0, 'eval_f5-PHONE_NUM': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9629629629629629, 'eval_p-ID_NUM': 0.9615384615384616, 'eval_r-ID_NUM': 0.9615384615384616, 'eval_f5-ID_NUM': 0.9615384615384616, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.8275862068965517, 'eval_r-URL_PERSONAL': 0.96, 'eval_f5-URL_PERSONAL': 0.9541284403669724, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8197424892703863, 'eval_r-NAME_STUDENT': 0.9581939799331104, 'eval_f5-NAME_STUDENT': 0.9520097130807079, 'eval_runtime': 53.0311, 'eval_samples_per_second': 32.019, 'eval_steps_per_second': 32.019, 'epoch': 2.89}
{'loss': 0.0002, 'learning_rate': 7.250945775535939e-07, 'epoch': 2.93}
{'l

  0%|          | 0/1698 [00:00<?, ?it/s]

Map (num_proc=12):   0%|          | 0/4714 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/1714 [00:00<?, ? examples/s]

  0%|          | 0/882 [00:00<?, ?it/s]

{'loss': 2.5191, 'learning_rate': 2.5280898876404495e-06, 'epoch': 0.03}
{'loss': 1.5837, 'learning_rate': 5.3370786516853935e-06, 'epoch': 0.07}
{'loss': 0.3097, 'learning_rate': 8.146067415730338e-06, 'epoch': 0.1}
{'loss': 0.1008, 'learning_rate': 1.0955056179775282e-05, 'epoch': 0.14}
{'loss': 0.0509, 'learning_rate': 1.3764044943820225e-05, 'epoch': 0.17}
{'loss': 0.0424, 'learning_rate': 1.657303370786517e-05, 'epoch': 0.2}
{'loss': 0.0277, 'learning_rate': 1.9382022471910114e-05, 'epoch': 0.24}
{'loss': 0.0237, 'learning_rate': 2.2191011235955056e-05, 'epoch': 0.27}
{'loss': 0.0118, 'learning_rate': 2.5e-05, 'epoch': 0.31}
{'loss': 0.0132, 'learning_rate': 2.4684741488020177e-05, 'epoch': 0.34}


  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0038712820969522, 'eval_precision': 0.5344827586206896, 'eval_recall': 0.8942307692307693, 'eval_f5': 0.8716654650324441, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.5416666666666666, 'eval_r-ID_NUM': 0.9629629629629629, 'eval_f5-ID_NUM': 0.9349930843706775, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.5409836065573771, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9683972911963883, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.5322128851540616, 'eval_r-NAME_STUDENT': 0.8823529411764706, 'eval_f5-NAME_STUDENT': 0.8605772022530631, 'eval_runtime': 54.4506, 'eval_samples_per_second': 31.478, 'eval_steps_per_second': 31.478, 'epoch': 0.34}
{'loss': 0.0098, 'learning_rate': 2.4369482976040353e-05, 'epoch': 0.37}
{'loss': 0.0075, 'le

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.00235686800442636, 'eval_precision': 0.7145877378435518, 'eval_recall': 0.9285714285714286, 'eval_f5': 0.917998537553536, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6666666666666666, 'eval_r-ID_NUM': 0.9629629629629629, 'eval_f5-ID_NUM': 0.946778711484594, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.6206896551724138, 'eval_r-URL_PERSONAL': 0.5454545454545454, 'eval_f5-URL_PERSONAL': 0.5480093676814987, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.71849234393404, 'eval_r-NAME_STUDENT': 0.9442724458204335, 'eval_f5-NAME_STUDENT': 0.9329960585916819, 'eval_runtime': 53.7818, 'eval_samples_per_second': 31.87, 'eval_steps_per_second': 31.87, 'epoch': 0.51}
{'loss': 0.0047, 'learning_rate': 2.2793190416141236e-05, 'epoch': 0.54}
{'loss': 0

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0020340532064437866, 'eval_precision': 0.652963671128107, 'eval_recall': 0.9381868131868132, 'eval_f5': 0.9226852332952302, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.5882352941176471, 'eval_r-ID_NUM': 0.7407407407407407, 'eval_f5-ID_NUM': 0.7334273624823695, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.4852941176470588, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9608062709966405, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.6659364731653888, 'eval_r-NAME_STUDENT': 0.9411764705882353, 'eval_f5-NAME_STUDENT': 0.9264490417863214, 'eval_runtime': 54.5547, 'eval_samples_per_second': 31.418, 'eval_steps_per_second': 31.418, 'epoch': 0.68}
{'loss': 0.0036, 'learning_rate': 2.121689785624212e-05, 'epoch': 0.71}
{'loss': 0.005, 'le

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0015167933888733387, 'eval_precision': 0.7857142857142857, 'eval_recall': 0.9368131868131868, 'eval_f5': 0.9299349695825466, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6097560975609756, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.9078212290502793, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.717391304347826, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9850746268656716, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8037383177570093, 'eval_r-NAME_STUDENT': 0.9318885448916409, 'eval_f5-NAME_STUDENT': 0.926208651399491, 'eval_runtime': 54.2331, 'eval_samples_per_second': 31.604, 'eval_steps_per_second': 31.604, 'epoch': 0.85}
{'loss': 0.0018, 'learning_rate': 1.9640605296343e-05, 'epoch': 0.88}
{'loss': 0.0017, 'lear

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0012661706423386931, 'eval_precision': 0.828009828009828, 'eval_recall': 0.9258241758241759, 'eval_f5': 0.9216366887556537, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6578947368421053, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.91164095371669, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8401697312588402, 'eval_r-NAME_STUDENT': 0.9195046439628483, 'eval_f5-NAME_STUDENT': 0.9161772557394554, 'eval_runtime': 54.0156, 'eval_samples_per_second': 31.732, 'eval_steps_per_second': 31.732, 'epoch': 1.02}
{'loss': 0.0016, 'learning_rate': 1.8064312736443884e-05, 'epoch': 1.05}
{'loss': 0.0008, 'learning_rate': 

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0012480069417506456, 'eval_precision': 0.8672086720867209, 'eval_recall': 0.8791208791208791, 'eval_f5': 0.8786566691308481, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.5526315789473685, 'eval_r-ID_NUM': 0.7777777777777778, 'eval_f5-ID_NUM': 0.7657784011220196, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8995215311004785, 'eval_r-NAME_STUDENT': 0.8730650154798761, 'eval_f5-NAME_STUDENT': 0.8740537640817787, 'eval_runtime': 54.592, 'eval_samples_per_second': 31.397, 'eval_steps_per_second': 31.397, 'epoch': 1.19}
{'loss': 0.0013, 'learning_rate': 1.6488020176544767e-05, 'epoch': 1.22}
{'loss': 0.0013, 'learning_rate'

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0010915239108726382, 'eval_precision': 0.8621134020618557, 'eval_recall': 0.9189560439560439, 'eval_f5': 0.9166315345699831, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6285714285714286, 'eval_r-ID_NUM': 0.8148148148148148, 'eval_f5-ID_NUM': 0.8056338028169013, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.673469387755102, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9816933638443935, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8849028400597907, 'eval_r-NAME_STUDENT': 0.9164086687306502, 'eval_f5-NAME_STUDENT': 0.915155478922647, 'eval_runtime': 53.585, 'eval_samples_per_second': 31.987, 'eval_steps_per_second': 31.987, 'epoch': 1.36}
{'loss': 0.0011, 'learning_rate': 1.491172761664565e-05, 'epoch': 1.39}
{'loss': 0.001, 'lear

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0008985201129689813, 'eval_precision': 0.82875, 'eval_recall': 0.9107142857142857, 'eval_f5': 0.9072631578947369, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6764705882352942, 'eval_r-ID_NUM': 0.8518518518518519, 'eval_f5-ID_NUM': 0.843441466854725, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.7575757575757576, 'eval_f5-URL_PERSONAL': 0.7647058823529411, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8270571827057183, 'eval_r-NAME_STUDENT': 0.9179566563467493, 'eval_f5-NAME_STUDENT': 0.9140926068654771, 'eval_runtime': 54.4157, 'eval_samples_per_second': 31.498, 'eval_steps_per_second': 31.498, 'epoch': 1.53}
{'loss': 0.0009, 'learning_rate': 1.3335435056746534e-05, 'epoch': 1.56}
{'loss': 0.0017, 'learning_ra

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.001682151691056788, 'eval_precision': 0.7505494505494505, 'eval_recall': 0.9381868131868132, 'eval_f5': 0.9292517006802721, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6060606060606061, 'eval_r-ID_NUM': 0.7407407407407407, 'eval_f5-ID_NUM': 0.7344632768361582, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.717391304347826, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9850746268656716, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.7590511860174781, 'eval_r-NAME_STUDENT': 0.9411764705882353, 'eval_f5-NAME_STUDENT': 0.9325703498318682, 'eval_runtime': 53.5569, 'eval_samples_per_second': 32.003, 'eval_steps_per_second': 32.003, 'epoch': 1.7}
{'loss': 0.001, 'learning_rate': 1.1759142496847415e-05, 'epoch': 1.73}
{'loss': 0.0006, 'lea

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0010156100615859032, 'eval_precision': 0.8126491646778043, 'eval_recall': 0.9354395604395604, 'eval_f5': 0.9300346675070911, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6756756756756757, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.9129213483146068, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7441860465116279, 'eval_r-URL_PERSONAL': 0.9696969696969697, 'eval_f5-URL_PERSONAL': 0.9585253456221197, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8201634877384196, 'eval_r-NAME_STUDENT': 0.9318885448916409, 'eval_f5-NAME_STUDENT': 0.9270315091210612, 'eval_runtime': 54.5363, 'eval_samples_per_second': 31.429, 'eval_steps_per_second': 31.429, 'epoch': 1.87}
{'loss': 0.0014, 'learning_rate': 1.0182849936948297e-05, 'epoch': 1.9}
{'l

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0009696885826997459, 'eval_precision': 0.8016528925619835, 'eval_recall': 0.9326923076923077, 'eval_f5': 0.9268651231165013, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.7428571428571429, 'eval_r-ID_NUM': 0.9629629629629629, 'eval_f5-ID_NUM': 0.9521126760563381, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7567567567567568, 'eval_r-URL_PERSONAL': 0.8484848484848485, 'eval_f5-URL_PERSONAL': 0.8445475638051045, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8007968127490039, 'eval_r-NAME_STUDENT': 0.93343653250774, 'eval_f5-NAME_STUDENT': 0.9275276578122225, 'eval_runtime': 54.0819, 'eval_samples_per_second': 31.693, 'eval_steps_per_second': 31.693, 'epoch': 2.04}
{'loss': 0.0008, 'learning_rate': 8.60655737704918e-06, 'epoch': 2.07}
{'loss

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0014954755315557122, 'eval_precision': 0.8282950423216445, 'eval_recall': 0.9409340659340659, 'eval_f5': 0.936038261417985, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.6578947368421053, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.91164095371669, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8391123439667129, 'eval_r-NAME_STUDENT': 0.9365325077399381, 'eval_f5-NAME_STUDENT': 0.9323691541698772, 'eval_runtime': 53.7374, 'eval_samples_per_second': 31.896, 'eval_steps_per_second': 31.896, 'epoch': 2.21}
{'loss': 0.0005, 'learning_rate': 7.030264817150063e-06, 'epoch': 2.24}
{'loss': 0.001, 'learning_rate': 6.

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0008133295341394842, 'eval_precision': 0.84472049689441, 'eval_recall': 0.9340659340659341, 'eval_f5': 0.9302815048671403, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.7647058823529411, 'eval_r-ID_NUM': 0.9629629629629629, 'eval_f5-ID_NUM': 0.9534555712270804, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.7674418604651163, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.988479262672811, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8508522727272727, 'eval_r-NAME_STUDENT': 0.9272445820433437, 'eval_f5-NAME_STUDENT': 0.924053637118785, 'eval_runtime': 54.4058, 'eval_samples_per_second': 31.504, 'eval_steps_per_second': 31.504, 'epoch': 2.38}
{'loss': 0.0002, 'learning_rate': 5.453972257250946e-06, 'epoch': 2.41}
{'loss': 0.0005, 'lear

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0008116228273138404, 'eval_precision': 0.8363858363858364, 'eval_recall': 0.9409340659340659, 'eval_f5': 0.9364319890635681, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.7575757575757576, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.9180790960451977, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.8205128205128205, 'eval_r-URL_PERSONAL': 0.9696969696969697, 'eval_f5-URL_PERSONAL': 0.962962962962963, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8370165745856354, 'eval_r-NAME_STUDENT': 0.9380804953560371, 'eval_f5-NAME_STUDENT': 0.933744221879815, 'eval_runtime': 54.1234, 'eval_samples_per_second': 31.668, 'eval_steps_per_second': 31.668, 'epoch': 2.55}
{'loss': 0.0003, 'learning_rate': 3.877679697351828e-06, 'epoch': 2.58}
{'los

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0009317489457316697, 'eval_precision': 0.8388683886838868, 'eval_recall': 0.9368131868131868, 'eval_f5': 0.9326250460211435, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.7222222222222222, 'eval_r-ID_NUM': 0.9629629629629629, 'eval_f5-ID_NUM': 0.9507735583684952, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.8464788732394366, 'eval_r-NAME_STUDENT': 0.9303405572755418, 'eval_f5-NAME_STUDENT': 0.9268090154211152, 'eval_runtime': 54.1277, 'eval_samples_per_second': 31.666, 'eval_steps_per_second': 31.666, 'epoch': 2.72}
{'loss': 0.0005, 'learning_rate': 2.3013871374527115e-06, 'epoch': 2.75}
{'loss': 0.0003, 'learning_rate

  0%|          | 0/1714 [00:00<?, ?it/s]

{'eval_loss': 0.0010035918094217777, 'eval_precision': 0.8200238379022646, 'eval_recall': 0.945054945054945, 'eval_f5': 0.9395451441777404, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-ID_NUM': 0.7142857142857143, 'eval_r-ID_NUM': 0.9259259259259259, 'eval_f5-ID_NUM': 0.9154929577464788, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-NAME_STUDENT': 0.824966078697422, 'eval_r-NAME_STUDENT': 0.9411764705882353, 'eval_f5-NAME_STUDENT': 0.9361046959199384, 'eval_runtime': 54.5581, 'eval_samples_per_second': 31.416, 'eval_steps_per_second': 31.416, 'epoch': 2.89}
{'loss': 0.0002, 'learning_rate': 7.250945775535939e-07, 'epoch': 2.92}
{'loss': 0.0003, 'learning_rate': 

  0%|          | 0/1714 [00:00<?, ?it/s]

Map (num_proc=12):   0%|          | 0/4736 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/1689 [00:00<?, ? examples/s]

  0%|          | 0/888 [00:00<?, ?it/s]

{'loss': 2.5133, 'learning_rate': 2.5280898876404495e-06, 'epoch': 0.03}
{'loss': 1.5828, 'learning_rate': 5.3370786516853935e-06, 'epoch': 0.07}
{'loss': 0.3047, 'learning_rate': 8.146067415730338e-06, 'epoch': 0.1}
{'loss': 0.1062, 'learning_rate': 1.0955056179775282e-05, 'epoch': 0.14}
{'loss': 0.0688, 'learning_rate': 1.3764044943820225e-05, 'epoch': 0.17}
{'loss': 0.0387, 'learning_rate': 1.657303370786517e-05, 'epoch': 0.2}
{'loss': 0.0237, 'learning_rate': 1.9382022471910114e-05, 'epoch': 0.24}
{'loss': 0.0134, 'learning_rate': 2.2191011235955056e-05, 'epoch': 0.27}
{'loss': 0.012, 'learning_rate': 2.5e-05, 'epoch': 0.3}
{'loss': 0.0158, 'learning_rate': 2.4687108886107636e-05, 'epoch': 0.34}


  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0018785007996484637, 'eval_precision': 0.787531806615776, 'eval_recall': 0.8387533875338753, 'eval_f5': 0.8366604283634852, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.25, 'eval_f5-USERNAME': 0.25742574257425743, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.8571428571428571, 'eval_f5-ID_NUM': 0.8618784530386739, 'eval_p-STREET_ADDRESS': 0.8571428571428571, 'eval_r-STREET_ADDRESS': 0.8181818181818182, 'eval_f5-STREET_ADDRESS': 0.819614711033275, 'eval_p-URL_PERSONAL': 0.7096774193548387, 'eval_r-URL_PERSONAL': 0.7857142857142857, 'eval_f5-URL_PERSONAL': 0.7824897400820794, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.7834036568213784, 'eval_r-NAME_STUDENT': 0.8426626323751891, 'eval_f5-NAME_STUDENT': 0.8402181480621956, 'eval_runtime': 51.9652, 'eval_samples_per_second': 32.503, 'eval_steps_per_second': 32.503, 'epoch': 0.34}


  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0020587865728884935, 'eval_precision': 0.8923766816143498, 'eval_recall': 0.8089430894308943, 'eval_f5': 0.8118625451121921, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.5, 'eval_f5-USERNAME': 0.5098039215686274, 'eval_p-ID_NUM': 0.875, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.994535519125683, 'eval_p-STREET_ADDRESS': 0.8333333333333334, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9059233449477352, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.42857142857142855, 'eval_f5-URL_PERSONAL': 0.43820224719101125, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8925619834710744, 'eval_r-NAME_STUDENT': 0.8169440242057489, 'eval_f5-NAME_STUDENT': 0.8196147110332749, 'eval_runtime': 51.5895, 'eval_samples_per_second': 32.739, 'eval_steps_per_second': 32.739, 'epoch': 0.51}
{'loss': 0.0054, 'learning_

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0011181195732206106, 'eval_precision': 0.8558786346396966, 'eval_recall': 0.9173441734417345, 'eval_f5': 0.9148173171872565, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.8, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9904761904761905, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-STREET_ADDRESS': 1.0, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9122807017543859, 'eval_p-URL_PERSONAL': 0.6585365853658537, 'eval_r-URL_PERSONAL': 0.9642857142857143, 'eval_f5-URL_PERSONAL': 0.9473684210526314, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8601997146932953, 'eval_r-NAME_STUDENT': 0.9122541603630863, 'eval_f5-NAME_STUDENT': 0.9101358411703241, 'eval_runtime': 52.2943, 'eval_samples_per_second': 32.298, 'eval_steps_per_second': 32.298, 'epoch': 0.68}
{'loss': 0.0045, 'learning_rate': 2.124530663

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.001330066006630659, 'eval_precision': 0.9261538461538461, 'eval_recall': 0.8157181571815718, 'eval_f5': 0.8194764397905759, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 0.75, 'eval_f5-USERNAME': 0.7358490566037735, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.7857142857142857, 'eval_f5-ID_NUM': 0.7922437673130194, 'eval_p-STREET_ADDRESS': 0.9090909090909091, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9090909090909092, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.6785714285714286, 'eval_f5-URL_PERSONAL': 0.6870653685674548, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.9278350515463918, 'eval_r-NAME_STUDENT': 0.8169440242057489, 'eval_f5-NAME_STUDENT': 0.8207166656924066, 'eval_runtime': 51.7433, 'eval_samples_per_second': 32.642, 'eval_steps_per_second': 32.642, 'epoch': 0.84}
{'loss': 0.0035

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0012854061787948012, 'eval_precision': 0.8753462603878116, 'eval_recall': 0.8563685636856369, 'eval_f5': 0.8570832464010014, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-STREET_ADDRESS': 0.7916666666666666, 'eval_r-STREET_ADDRESS': 0.8636363636363636, 'eval_f5-STREET_ADDRESS': 0.8606271777003486, 'eval_p-URL_PERSONAL': 0.627906976744186, 'eval_r-URL_PERSONAL': 0.9642857142857143, 'eval_f5-URL_PERSONAL': 0.9448183041722747, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8915470494417863, 'eval_r-NAME_STUDENT': 0.8456883509833586, 'eval_f5-NAME_STUDENT': 0.8473647388059703, 'eval_runtime': 51.9312, 'eval_samples_per_second': 32.524, 'eval_steps_per_second': 32.524, 'epoch': 1.01}
{'loss': 0.0012, 'learning_rate': 1.8116395494

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.001734858495183289, 'eval_precision': 0.6773888363292336, 'eval_recall': 0.9701897018970189, 'eval_f5': 0.9543240887886401, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.6666666666666666, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9811320754716982, 'eval_p-ID_NUM': 0.8125, 'eval_r-ID_NUM': 0.9285714285714286, 'eval_f5-ID_NUM': 0.923497267759563, 'eval_p-STREET_ADDRESS': 0.6785714285714286, 'eval_r-STREET_ADDRESS': 0.8636363636363636, 'eval_f5-STREET_ADDRESS': 0.8546712802768166, 'eval_p-URL_PERSONAL': 0.88, 'eval_r-URL_PERSONAL': 0.7857142857142857, 'eval_f5-URL_PERSONAL': 0.7889655172413793, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.6676954732510288, 'eval_r-NAME_STUDENT': 0.9818456883509834, 'eval_f5-NAME_STUDENT': 0.9643938960964736, 'eval_runtime': 52.3299, 'eval_samples_per_second': 32.276, 'eval_steps_per_second': 32.276, 'epoch': 1.18

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.000959223136305809, 'eval_precision': 0.8923959827833573, 'eval_recall': 0.8428184281842819, 'eval_f5': 0.8446231785658328, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.875, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.994535519125683, 'eval_p-STREET_ADDRESS': 0.9090909090909091, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9090909090909092, 'eval_p-URL_PERSONAL': 0.6428571428571429, 'eval_r-URL_PERSONAL': 0.9642857142857143, 'eval_f5-URL_PERSONAL': 0.9460916442048517, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.9087893864013267, 'eval_r-NAME_STUDENT': 0.8290468986384266, 'eval_f5-NAME_STUDENT': 0.8318542737038767, 'eval_runtime': 52.2168, 'eval_samples_per_second': 32.346, 'eval_steps_per_second': 32.346, 'epoch': 1.35}
{'loss': 0.0008, 'learning_rat

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0013964541722089052, 'eval_precision': 0.8184143222506394, 'eval_recall': 0.8672086720867209, 'eval_f5': 0.8652246256239601, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.6666666666666666, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9811320754716982, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.8571428571428571, 'eval_f5-ID_NUM': 0.8618784530386739, 'eval_p-STREET_ADDRESS': 0.8695652173913043, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9075043630017451, 'eval_p-URL_PERSONAL': 0.6428571428571429, 'eval_r-URL_PERSONAL': 0.9642857142857143, 'eval_f5-URL_PERSONAL': 0.9460916442048517, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8243831640058055, 'eval_r-NAME_STUDENT': 0.859304084720121, 'eval_f5-NAME_STUDENT': 0.8579063552922039, 'eval_runtime': 51.614, 'eval_samples_per_second': 32.724, 'eval_steps_per_second': 32.724, 'e

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.001606619800440967, 'eval_precision': 0.668241965973535, 'eval_recall': 0.9579945799457995, 'eval_f5': 0.9422800902193972, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.6666666666666666, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9811320754716982, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.8571428571428571, 'eval_f5-ID_NUM': 0.8618784530386739, 'eval_p-STREET_ADDRESS': 0.8695652173913043, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9075043630017451, 'eval_p-URL_PERSONAL': 0.6842105263157895, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.915989159891599, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.6563467492260062, 'eval_r-NAME_STUDENT': 0.962178517397882, 'eval_f5-NAME_STUDENT': 0.9452383674402652, 'eval_runtime': 52.0258, 'eval_samples_per_second': 32.465, 'eval_steps_per_second': 32.465, 'epo

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0010481404606252909, 'eval_precision': 0.8171641791044776, 'eval_recall': 0.8902439024390244, 'eval_f5': 0.8871922717357432, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-ID_NUM': 0.8235294117647058, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9918256130790191, 'eval_p-STREET_ADDRESS': 0.8, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9043478260869565, 'eval_p-URL_PERSONAL': 0.7647058823529411, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.9209809264305179, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8179271708683473, 'eval_r-NAME_STUDENT': 0.8835098335854765, 'eval_f5-NAME_STUDENT': 0.8807935495098324, 'eval_runtime': 52.2813, 'eval_samples_per_second': 32.306, 'eval_steps_per_second': 32.306, 'epoch': 1.86}
{'loss': 0.0014, 'learning_rat

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0010781948221847415, 'eval_precision': 0.826362484157161, 'eval_recall': 0.8834688346883469, 'eval_f5': 0.881126877696346, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.8, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9904761904761905, 'eval_p-ID_NUM': 0.8666666666666667, 'eval_r-ID_NUM': 0.9285714285714286, 'eval_f5-ID_NUM': 0.9260273972602742, 'eval_p-STREET_ADDRESS': 0.7037037037037037, 'eval_r-STREET_ADDRESS': 0.8636363636363636, 'eval_f5-STREET_ADDRESS': 0.856152512998267, 'eval_p-URL_PERSONAL': 0.8666666666666667, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.9260273972602742, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8276353276353277, 'eval_r-NAME_STUDENT': 0.8789712556732224, 'eval_f5-NAME_STUDENT': 0.8768793173506706, 'eval_runtime': 51.8877, 'eval_samples_per_second': 32.551, 'eval_steps_per_second': 32.551, 'ep

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0012677829945459962, 'eval_precision': 0.7975609756097561, 'eval_recall': 0.8861788617886179, 'eval_f5': 0.8824078879086664, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.6666666666666666, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9811320754716982, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 0.8571428571428571, 'eval_f5-ID_NUM': 0.8571428571428571, 'eval_p-STREET_ADDRESS': 0.8695652173913043, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9075043630017451, 'eval_p-URL_PERSONAL': 0.6585365853658537, 'eval_r-URL_PERSONAL': 0.9642857142857143, 'eval_f5-URL_PERSONAL': 0.9473684210526314, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8016528925619835, 'eval_r-NAME_STUDENT': 0.8804841149773072, 'eval_f5-NAME_STUDENT': 0.8771665410700829, 'eval_runtime': 52.3045, 'eval_samples_per_second': 32.292, 'eval_steps_per_se

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0010749534703791142, 'eval_precision': 0.806060606060606, 'eval_recall': 0.9010840108401084, 'eval_f5': 0.8970168612191958, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.8, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9904761904761905, 'eval_p-ID_NUM': 0.8666666666666667, 'eval_r-ID_NUM': 0.9285714285714286, 'eval_f5-ID_NUM': 0.9260273972602742, 'eval_p-STREET_ADDRESS': 0.8695652173913043, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9075043630017451, 'eval_p-URL_PERSONAL': 0.8125, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.923497267759563, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8013513513513514, 'eval_r-NAME_STUDENT': 0.897125567322239, 'eval_f5-NAME_STUDENT': 0.8930205618302924, 'eval_runtime': 51.7432, 'eval_samples_per_second': 32.642, 'eval_steps_per_second': 32.642, 'epoch': 2.36}


  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0009519105078652501, 'eval_precision': 0.879286694101509, 'eval_recall': 0.8685636856368564, 'eval_f5': 0.8689712706606184, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.6666666666666666, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9811320754716982, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 0.8571428571428571, 'eval_f5-ID_NUM': 0.8571428571428571, 'eval_p-STREET_ADDRESS': 0.8333333333333334, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9059233449477352, 'eval_p-URL_PERSONAL': 0.8125, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.923497267759563, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8864696734059098, 'eval_r-NAME_STUDENT': 0.8623298033282905, 'eval_f5-NAME_STUDENT': 0.8632339235787512, 'eval_runtime': 52.544, 'eval_samples_per_second': 32.145, 'eval_steps_per_second': 32.145, 

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0009518184815533459, 'eval_precision': 0.8844566712517193, 'eval_recall': 0.8712737127371274, 'eval_f5': 0.871773478646295, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.8, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9904761904761905, 'eval_p-ID_NUM': 0.8666666666666667, 'eval_r-ID_NUM': 0.9285714285714286, 'eval_f5-ID_NUM': 0.9260273972602742, 'eval_p-STREET_ADDRESS': 0.8695652173913043, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9075043630017451, 'eval_p-URL_PERSONAL': 0.8387096774193549, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.9247606019151848, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8880248833592534, 'eval_r-NAME_STUDENT': 0.8638426626323752, 'eval_f5-NAME_STUDENT': 0.8647483690587139, 'eval_runtime': 51.8597, 'eval_samples_per_second': 32.569, 'eval_steps_per_second': 32.569, '

  0%|          | 0/1689 [00:00<?, ?it/s]

{'eval_loss': 0.0010469290427863598, 'eval_precision': 0.8238993710691824, 'eval_recall': 0.8875338753387534, 'eval_f5': 0.8849051701740712, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_p-USERNAME': 0.8, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9904761904761905, 'eval_p-ID_NUM': 0.8666666666666667, 'eval_r-ID_NUM': 0.9285714285714286, 'eval_f5-ID_NUM': 0.9260273972602742, 'eval_p-STREET_ADDRESS': 0.8333333333333334, 'eval_r-STREET_ADDRESS': 0.9090909090909091, 'eval_f5-STREET_ADDRESS': 0.9059233449477352, 'eval_p-URL_PERSONAL': 0.7647058823529411, 'eval_r-URL_PERSONAL': 0.9285714285714286, 'eval_f5-URL_PERSONAL': 0.9209809264305179, 'eval_p-EMAIL': 0.8888888888888888, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 0.9952153110047847, 'eval_p-NAME_STUDENT': 0.8246110325318247, 'eval_r-NAME_STUDENT': 0.8819969742813918, 'eval_f5-NAME_STUDENT': 0.8796425255338902, 'eval_runtime': 52.2656, 'eval_samples_per_second': 32.316, 'eval_steps_per_second': 32.316, 

  0%|          | 0/1689 [00:00<?, ?it/s]

Map (num_proc=12):   0%|          | 0/4723 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/1706 [00:00<?, ? examples/s]

  0%|          | 0/885 [00:00<?, ?it/s]

{'loss': 2.5476, 'learning_rate': 2.5280898876404495e-06, 'epoch': 0.03}
{'loss': 1.5838, 'learning_rate': 5.3370786516853935e-06, 'epoch': 0.07}


In [None]:

# train_ds[70]['length']
for i in range(60, 75):
    print(train_ds[i]['length'])

1355
1040
586
966
763
572
438
543
635
940
875
421
916
920
515


In [None]:
train_ds[69]['length']


940