## ⚡️ How to train deberta to reach LB: 0.966
This is the training scriped I used to train the model used in my notebook ["DeBERTa-v3 Single Model LB:0.966"](https://www.kaggle.com/code/emiz6413/deberta-v3-single-model-lb-0-966)

#### Key points are as follows:
1. The (original) training data is splitted to 4 folds according to their `document % 4`

2. [MPWARE's dataset](https://www.kaggle.com/datasets/mpware/pii-mixtral8x7b-generated-essays) is added to each training set
I've tried different external datasets (though not very extensively) and found this works the best

3. The first 6 layers in the encoder are frozen while the embedding layer is trainable<br>

4. Training & evaluation dataset is truncated with `MAX_LENGTH=3072` <br>
My previous [notebook](https://www.kaggle.com/code/emiz6413/945-947-deberta-v3-base-infer-truncation-false) found longer input sequence inference performs better than striding inference, which made me think longer input sequence during training also improves the performance.

5. Evaluation is done every 50 steps and computes overall f5/recall/precision and entity-wise f5/recall/precision based on spacy token level prediction and ground-truth. The best checkpoint based on eval/f5 in each fold is selected as the final model

#### Note
I found by coincidence that thresholding on logits instead of softmax of logits can find better models on LB even though inference is done with thresholding on softmax. Please refer `MetricsComputer` for implementation. I haven't figured out why this is the case, so please let me know your thoughts in the comment.

In [1]:
import json
import copy
import gc
import os
import re
from collections import defaultdict
from pathlib import Path

import torch
from torch import nn
import numpy as np
import pandas as pd
from spacy.lang.en import English
from transformers.tokenization_utils import PreTrainedTokenizerBase
from transformers.models.deberta_v2 import DebertaV2ForTokenClassification, DebertaV2TokenizerFast
from transformers.trainer import Trainer
from transformers.training_args import TrainingArguments
from transformers.trainer_utils import EvalPrediction
from transformers.data.data_collator import DataCollatorForTokenClassification
from datasets import Dataset, DatasetDict, concatenate_datasets
import wandb

2024-04-06 21:50:48.914674: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 21:50:48.942194: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 21:50:49.924551: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-06 21:50:49.927899: I tensorflow/comp

## Config & Parameters

In [2]:
TRAINING_MODEL_PATH = "microsoft/deberta-v3-large"
# TRAINING_MAX_LENGTH = 3072
# EVAL_MAX_LENGTH = 3072
TRAINING_MAX_LENGTH = 4072
EVAL_MAX_LENGTH = 4072
CONF_THRESH = 0.9
LR = 2.5e-5
LR_SCHEDULER_TYPE = "linear"
NUM_EPOCHS = 5
BATCH_SIZE = 1
EVAL_BATCH_SIZE = 1
GRAD_ACCUMULATION_STEPS = 16 // BATCH_SIZE
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.01
AMP = True
FREEZE_EMBEDDING = False
FREEZE_LAYERS = 6
N_SPLITS = 1
NEGATIVE_RATIO = 0.3  # down sample ratio of negative samples in the training set
OUTPUT_DIR = "output"
Path(OUTPUT_DIR).mkdir(exist_ok=True)

In [3]:
args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    fp16=AMP,
    learning_rate=LR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUMULATION_STEPS,
    gradient_checkpointing=True,
    report_to="none",
    evaluation_strategy="steps",
    eval_steps=50,
    eval_delay=100,
    save_strategy="steps",
    save_steps=50,
    save_total_limit=10,
    logging_steps=10,
    metric_for_best_model="f5",
    greater_is_better=True,
    load_best_model_at_end=True,
    overwrite_output_dir=True,
    lr_scheduler_type=LR_SCHEDULER_TYPE,
    warmup_ratio=WARMUP_RATIO,
    weight_decay=WEIGHT_DECAY,
)

## Dataset Preparation

In [4]:
with Path("input/pii-detection-removal-from-educational-data/train.json").open("r") as f:
    original_data = json.load(f)

with Path("input/pii-mixtral8x7b-generated-essays/mpware_mixtral8x7b_v1.1-no-i-username.json").open("r") as f:
    extra_data = json.load(f)
print("MPWARE's datapoints: ", len(extra_data))

MPWARE's datapoints:  2692


In [5]:
all_labels = [
    'B-EMAIL', 'B-ID_NUM', 'B-NAME_STUDENT', 'B-PHONE_NUM', 'B-STREET_ADDRESS', 'B-URL_PERSONAL', 'B-USERNAME', 'I-ID_NUM', 'I-NAME_STUDENT', 'I-PHONE_NUM', 'I-STREET_ADDRESS', 'I-URL_PERSONAL', 'O'
]
id2label = {i: l for i, l in enumerate(all_labels)}
label2id = {v: k for k, v in id2label.items()}
target = [l for l in all_labels if l != "O"]

## Tokenization

In [6]:
class CustomTokenizer:
    def __init__(self, tokenizer: PreTrainedTokenizerBase, label2id: dict, max_length: int) -> None:
        self.tokenizer = tokenizer
        self.label2id = label2id
        self.max_length = max_length

    def __call__(self, example: dict) -> dict:
        # rebuild text from tokens
        text, labels, token_map = [], [], []

        for idx, (t, l, ws) in enumerate(
            zip(example["tokens"], example["provided_labels"], example["trailing_whitespace"])
        ):
            text.append(t)
            labels.extend([l] * len(t))
            token_map.extend([idx]*len(t))

            if ws:
                text.append(" ")
                labels.append("O")
                token_map.append(-1)

        text = "".join(text)
        labels = np.array(labels)

        # actual tokenization
        tokenized = self.tokenizer(
            "".join(text),
            return_offsets_mapping=True,
            truncation=True,
            max_length=self.max_length
        )

        token_labels = []

        for start_idx, end_idx in tokenized.offset_mapping:
            # CLS token
            if start_idx == 0 and end_idx == 0:
                token_labels.append(self.label2id["O"])
                continue

            # case when token starts with whitespace
            if text[start_idx].isspace():
                start_idx += 1

            token_labels.append(self.label2id[labels[start_idx]])

        length = len(tokenized.input_ids)

        return {**tokenized, "labels": token_labels, "length": length, "token_map": token_map}

## Instantiate the dataset

In [7]:
tokenizer = DebertaV2TokenizerFast.from_pretrained(TRAINING_MODEL_PATH)
train_encoder = CustomTokenizer(tokenizer=tokenizer, label2id=label2id, max_length=TRAINING_MAX_LENGTH)
eval_encoder = CustomTokenizer(tokenizer=tokenizer, label2id=label2id, max_length=EVAL_MAX_LENGTH)

ds = DatasetDict()

for key, data in zip(["original", "extra"], [original_data, extra_data]):
    ds[key] = Dataset.from_dict({
        "full_text": [x["full_text"] for x in data],
        "document": [str(x["document"]) for x in data],
        "tokens": [x["tokens"] for x in data],
        "trailing_whitespace": [x["trailing_whitespace"] for x in data],
        "provided_labels": [x["labels"] for x in data],
    })



## Metrics

In [8]:
def find_span(target: list[str], document: list[str]) -> list[list[int]]:
    idx = 0
    spans = []
    span = []

    for i, token in enumerate(document):
        if token != target[idx]:
            idx = 0
            span = []
            continue
        span.append(i)
        idx += 1
        if idx == len(target):
            spans.append(span)
            span = []
            idx = 0
            continue

    return spans


class PRFScore:
    """A precision / recall / F score."""

    def __init__(
        self,
        *,
        tp: int = 0,
        fp: int = 0,
        fn: int = 0,
    ) -> None:
        self.tp = tp
        self.fp = fp
        self.fn = fn

    def __len__(self) -> int:
        return self.tp + self.fp + self.fn

    def __iadd__(self, other):  # in-place add
        self.tp += other.tp
        self.fp += other.fp
        self.fn += other.fn
        return self

    def __add__(self, other):
        return PRFScore(
            tp=self.tp + other.tp, fp=self.fp + other.fp, fn=self.fn + other.fn
        )

    def score_set(self, cand: set, gold: set) -> None:
        self.tp += len(cand.intersection(gold))
        self.fp += len(cand - gold)
        self.fn += len(gold - cand)

    @property
    def precision(self) -> float:
        return self.tp / (self.tp + self.fp + 1e-100)

    @property
    def recall(self) -> float:
        return self.tp / (self.tp + self.fn + 1e-100)

    @property
    def f1(self) -> float:
        p = self.precision
        r = self.recall
        return 2 * ((p * r) / (p + r + 1e-100))

    @property
    def f5(self) -> float:
        beta = 5
        p = self.precision
        r = self.recall

        fbeta = (1+(beta**2))*p*r / ((beta**2)*p + r + 1e-100)
        return fbeta

    def to_dict(self) -> dict[str, float]:
        return {"p": self.precision, "r": self.recall, "f5": self.f5}

In [9]:
class MetricsComputer:
    nlp = English()

    def __init__(self, eval_ds: Dataset, label2id: dict, conf_thresh: float = 0.9) -> None:
        self.ds = eval_ds.remove_columns("labels").rename_columns({"provided_labels": "labels"})
        self.gt_df = self.create_gt_df(self.ds)
        self.label2id = label2id
        self.confth = conf_thresh
        self._search_gt()

    def __call__(self, eval_preds: EvalPrediction) -> dict:
        pred_df = self.create_pred_df(eval_preds.predictions)
        return self.compute_metrics_from_df(self.gt_df, pred_df)

    def _search_gt(self) -> None:
        email_regex = re.compile(r'[\w.+-]+@[\w-]+\.[\w.-]+')
        phone_num_regex = re.compile(r"(\(\d{3}\)\d{3}\-\d{4}\w*|\d{3}\.\d{3}\.\d{4})\s")
        self.emails = []
        self.phone_nums = []

        for _data in self.ds:
            # email
            for token_idx, token in enumerate(_data["tokens"]):
                if re.fullmatch(email_regex, token) is not None:
                    self.emails.append(
                        {"document": _data["document"], "token": token_idx, "label": "B-EMAIL", "token_str": token}
                    )
            # phone number
            matches = phone_num_regex.findall(_data["full_text"])
            if not matches:
                continue
            for match in matches:
                target = [t.text for t in self.nlp.tokenizer(match)]
                matched_spans = find_span(target, _data["tokens"])
            for matched_span in matched_spans:
                for intermediate, token_idx in enumerate(matched_span):
                    prefix = "I" if intermediate else "B"
                    self.phone_nums.append(
                        {"document": _data["document"], "token": token_idx, "label": f"{prefix}-PHONE_NUM", "token_str": _data["tokens"][token_idx]}
                    )

    @staticmethod
    def create_gt_df(ds: Dataset):
        gt = []
        for row in ds:
            for token_idx, (token, label) in enumerate(zip(row["tokens"], row["labels"])):
                if label == "O":
                    continue
                gt.append(
                    {"document": row["document"], "token": token_idx, "label": label, "token_str": token}
                )
        gt_df = pd.DataFrame(gt)
        gt_df["row_id"] = gt_df.index

        return gt_df

    def create_pred_df(self, logits: np.ndarray) -> pd.DataFrame:
        """
        Note:
            Thresholing is doen on logits instead of softmax, which could find better models on LB.
        """
        prediction = logits
        o_index = self.label2id["O"]
        preds = prediction.argmax(-1)
        preds_without_o = prediction.copy()
        preds_without_o[:,:,o_index] = 0
        preds_without_o = preds_without_o.argmax(-1)
        o_preds = prediction[:,:,o_index]
        preds_final = np.where(o_preds < self.confth, preds_without_o , preds)

        pairs = set()
        processed = []

        # Iterate over document
        for p_doc, token_map, offsets, tokens, doc in zip(
            preds_final, self.ds["token_map"], self.ds["offset_mapping"], self.ds["tokens"], self.ds["document"]
        ):
            # Iterate over sequence
            for p_token, (start_idx, end_idx) in zip(p_doc, offsets):
                label_pred = id2label[p_token]

                if start_idx + end_idx == 0:
                    # [CLS] token i.e. BOS
                    continue

                if token_map[start_idx] == -1:
                    start_idx += 1

                # ignore "\n\n"
                while start_idx < len(token_map) and tokens[token_map[start_idx]].isspace():
                    start_idx += 1

                if start_idx >= len(token_map):
                    break

                token_id = token_map[start_idx]
                pair = (doc, token_id)

                # ignore "O", preds, phone number and  email
                if label_pred in ("O", "B-EMAIL", "B-PHONE_NUM", "I-PHONE_NUM") or token_id == -1:
                    continue

                if pair in pairs:
                    continue

                processed.append(
                    {"document": doc, "token": token_id, "label": label_pred, "token_str": tokens[token_id]}
                )
                pairs.add(pair)

        pred_df = pd.DataFrame(processed + self.emails + self.phone_nums)
        pred_df["row_id"] = list(range(len(pred_df)))

        return pred_df

    def compute_metrics_from_df(self, gt_df, pred_df):
        """
        Compute the LB metric (lb) and other auxiliary metrics
        """

        references = {(row.document, row.token, row.label) for row in gt_df.itertuples()}
        predictions = {(row.document, row.token, row.label) for row in pred_df.itertuples()}

        score_per_type = defaultdict(PRFScore)
        references = set(references)

        for ex in predictions:
            pred_type = ex[-1] # (document, token, label)
            if pred_type != 'O':
                pred_type = pred_type[2:] # avoid B- and I- prefix

            if pred_type not in score_per_type:
                score_per_type[pred_type] = PRFScore()

            if ex in references:
                score_per_type[pred_type].tp += 1
                references.remove(ex)
            else:
                score_per_type[pred_type].fp += 1

        for doc, tok, ref_type in references:
            if ref_type != 'O':
                ref_type = ref_type[2:] # avoid B- and I- prefix

            if ref_type not in score_per_type:
                score_per_type[ref_type] = PRFScore()
            score_per_type[ref_type].fn += 1

        totals = PRFScore()

        for prf in score_per_type.values():
            totals += prf

        return {
            "precision": totals.precision,
            "recall": totals.recall,
            "f5": totals.f5,
            **{
                f"{v_k}-{k}": v_v
                for k in set([l[2:] for l in self.label2id.keys() if l!= 'O'])
                for v_k, v_v in score_per_type[k].to_dict().items()
            },
        }

## Model

In [10]:
class ModelInit:
    def __init__(
        self,
        checkpoint: str,
        id2label: dict,
        label2id: dict,
        freeze_embedding: bool,
        freeze_layers: int,
    ) -> None:
        self.model = DebertaV2ForTokenClassification.from_pretrained(
            checkpoint,
            num_labels=len(id2label),
            id2label=id2label,
            label2id=label2id,
            ignore_mismatched_sizes=True
        )
        for param in self.model.deberta.embeddings.parameters():
            param.requires_grad = False if freeze_embedding else True
        for layer in self.model.deberta.encoder.layer[:freeze_layers]:
            for param in layer.parameters():
                param.requires_grad = False
        self.weight = copy.deepcopy(self.model.state_dict())

    def __call__(self) -> DebertaV2ForTokenClassification:
        self.model.load_state_dict(self.weight)
        return self.model

model_init = ModelInit(
    TRAINING_MODEL_PATH,
    id2label=id2label,
    label2id=label2id,
    freeze_embedding=FREEZE_EMBEDDING,
    freeze_layers=FREEZE_LAYERS,
)

  return self.fget.__get__(instance, owner)()
Some weights of DebertaV2ForTokenClassification were not initialized from the model checkpoint at microsoft/deberta-v3-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Split 
Split the original dataset into 4 folds according to `document % 4` <br>
Only uses the first 30% of negative samples in the training set but they are NOT excluded from the eval set to make sure cross-evalidation is done on the entire training dataset.

In [11]:
# split according to document id
folds = [
    (
        np.array([i for i, d in enumerate(ds["original"]["document"]) if int(d) % N_SPLITS == s]),
        np.array([i for i, d in enumerate(ds["original"]["document"]) if int(d) % N_SPLITS != s]),
    )
    for s in range(N_SPLITS)
]

negative_idxs = [i for i, labels in enumerate(ds["original"]["provided_labels"]) if not any(np.array(labels) != "O")]
exclude_indices = negative_idxs[int(len(negative_idxs) * NEGATIVE_RATIO):]

In [12]:
folds

[(array([   0,    1,    2, ..., 6804, 6805, 6806]), array([], dtype=float64))]

## Train
Performs cross-validation and save the best checkpoint's metrics as json.

In [13]:
for fold_idx, (train_idx, eval_idx) in enumerate(folds):
    eval_idx = np.random.choice(train_idx, len(train_idx) // 6, replace=False)
    args.run_name = f"fold-{fold_idx}"
    args.output_dir = os.path.join(OUTPUT_DIR, f"fold_{fold_idx}")
    original_ds = ds["original"].select([i for i in train_idx if i not in exclude_indices])
    train_ds = concatenate_datasets([original_ds, ds["extra"]])
    train_ds = train_ds.map(train_encoder, num_proc=os.cpu_count())
    eval_ds = ds["original"].select(eval_idx)
    eval_ds = eval_ds.map(eval_encoder, num_proc=os.cpu_count())
    trainer = Trainer(
        args=args,
        model_init=model_init,
        train_dataset=train_ds,
        eval_dataset=eval_ds,
        tokenizer=tokenizer,
        compute_metrics=MetricsComputer(eval_ds=eval_ds, label2id=label2id),
        data_collator=DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=16),
    )
    # break # delete this line to reproduce the result.
    trainer.train()
    eval_res = trainer.evaluate(eval_dataset=eval_ds)
    with open(os.path.join(args.output_dir, "eval_result.json"), "w") as f:
        json.dump(eval_res, f)
    del trainer
    gc.collect()
    torch.cuda.empty_cache()

Map (num_proc=12):   0%|          | 0/5395 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/1134 [00:00<?, ? examples/s]

  0%|          | 0/1685 [00:00<?, ?it/s]

{'loss': 2.4813, 'learning_rate': 1.3313609467455623e-06, 'epoch': 0.03}
{'loss': 2.0736, 'learning_rate': 2.8106508875739646e-06, 'epoch': 0.06}
{'loss': 1.0406, 'learning_rate': 4.2899408284023666e-06, 'epoch': 0.09}
{'loss': 0.2373, 'learning_rate': 5.76923076923077e-06, 'epoch': 0.12}
{'loss': 0.1023, 'learning_rate': 7.248520710059171e-06, 'epoch': 0.15}
{'loss': 0.0647, 'learning_rate': 8.727810650887574e-06, 'epoch': 0.18}
{'loss': 0.0413, 'learning_rate': 1.0207100591715976e-05, 'epoch': 0.21}
{'loss': 0.0303, 'learning_rate': 1.168639053254438e-05, 'epoch': 0.24}
{'loss': 0.0322, 'learning_rate': 1.3165680473372782e-05, 'epoch': 0.27}
{'loss': 0.017, 'learning_rate': 1.4644970414201184e-05, 'epoch': 0.3}


  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0030079674907028675, 'eval_precision': 0.6934579439252336, 'eval_recall': 0.7995689655172413, 'eval_f5': 0.7948908117016893, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 0.0, 'eval_r-USERNAME': 0.0, 'eval_f5-USERNAME': 0.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.3333333333333333, 'eval_f5-ID_NUM': 0.34210526315789475, 'eval_p-NAME_STUDENT': 0.7066381156316917, 'eval_r-NAME_STUDENT': 0.7971014492753623, 'eval_f5-NAME_STUDENT': 0.7931958953499121, 'eval_p-URL_PERSONAL': 0.4807692307692308, 'eval_r-URL_PERSONAL': 0.9259259259259259, 'eval_f5-URL_PERSONAL': 0.8940852819807428, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.4422, 'eval_samples_per_second': 32.925, 'eval_steps_per_second': 32.925, 'epoch': 0.3}
{'loss': 0.0164, 'learning_rate': 1.6124260355029585e-05, 'epoch': 0.33}
{'loss': 0.0134, 

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0013237153179943562, 'eval_precision': 0.8845360824742268, 'eval_recall': 0.9245689655172413, 'eval_f5': 0.9229623500206867, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 0.5, 'eval_r-USERNAME': 0.3333333333333333, 'eval_f5-USERNAME': 0.3376623376623376, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.8333333333333334, 'eval_f5-ID_NUM': 0.8387096774193549, 'eval_p-NAME_STUDENT': 0.888631090487239, 'eval_r-NAME_STUDENT': 0.9251207729468599, 'eval_f5-NAME_STUDENT': 0.923661997959373, 'eval_p-URL_PERSONAL': 0.7878787878787878, 'eval_r-URL_PERSONAL': 0.9629629629629629, 'eval_f5-URL_PERSONAL': 0.9548022598870057, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 35.0288, 'eval_samples_per_second': 32.373, 'eval_steps_per_second': 32.373, 'epoch': 0.44}
{'loss': 0.0062, 'learning_rate': 2.3520710059171598e-05, 'epo

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0009881724836304784, 'eval_precision': 0.8970588235294118, 'eval_recall': 0.9202586206896551, 'eval_f5': 0.919344153693276, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.3333333333333333, 'eval_f5-USERNAME': 0.34210526315789475, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.8333333333333334, 'eval_r-ID_NUM': 0.8333333333333334, 'eval_f5-ID_NUM': 0.8333333333333334, 'eval_p-NAME_STUDENT': 0.9026128266033254, 'eval_r-NAME_STUDENT': 0.9178743961352657, 'eval_f5-NAME_STUDENT': 0.9172778757775509, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.4322, 'eval_samples_per_second': 32.934, 'eval_steps_per_second': 32.934, 'epoch': 0.59}
{'loss': 0.0048, 'learning_rate': 2.4340369393139843e-05, 'e

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.000931334332562983, 'eval_precision': 0.958139534883721, 'eval_recall': 0.8879310344827587, 'eval_f5': 0.890440565253533, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.6666666666666666, 'eval_f5-USERNAME': 0.6753246753246752, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.625, 'eval_r-ID_NUM': 0.8333333333333334, 'eval_f5-ID_NUM': 0.8227848101265824, 'eval_p-NAME_STUDENT': 0.961038961038961, 'eval_r-NAME_STUDENT': 0.893719806763285, 'eval_f5-NAME_STUDENT': 0.8961341406613877, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.7777777777777778, 'eval_f5-URL_PERSONAL': 0.7844827586206896, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.9229, 'eval_samples_per_second': 32.472, 'eval_steps_per_second': 32.472, 'epoch': 0.74}
{'loss': 0.0031, 'learning_rate': 2.3515831134564646e-05, 'epoch': 0.77}
{'los

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0006890265503898263, 'eval_precision': 0.8861283643892339, 'eval_recall': 0.9224137931034483, 'eval_f5': 0.9209633369196392, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.3333333333333333, 'eval_f5-USERNAME': 0.34210526315789475, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 0.8333333333333334, 'eval_f5-ID_NUM': 0.8387096774193549, 'eval_p-NAME_STUDENT': 0.8881118881118881, 'eval_r-NAME_STUDENT': 0.9202898550724637, 'eval_f5-NAME_STUDENT': 0.919009184525466, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.3731, 'eval_samples_per_second': 32.991, 'eval_steps_per_second': 32.991, 'epoch': 0.89}
{'loss': 0.0015, 'learning_rate': 2.269129287598945e-05, 'epoch': 0.92}
{'l

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.001369777019135654, 'eval_precision': 0.6877828054298643, 'eval_recall': 0.9827586206896551, 'eval_f5': 0.9668107314686456, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.3333333333333333, 'eval_f5-USERNAME': 0.34210526315789475, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.6721581548599671, 'eval_r-NAME_STUDENT': 0.9855072463768116, 'eval_f5-NAME_STUDENT': 0.9681482157524871, 'eval_p-URL_PERSONAL': 0.7714285714285715, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9887323943661971, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 35.377, 'eval_samples_per_second': 32.055, 'eval_steps_per_second': 32.055, 'epoch': 1.04}
{'loss': 0.0022, 'learning_rate': 2.1866754617414248e-05, 'epoch': 1.07}
{'loss': 0.0017, 'learning_rate':

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0006126861553639174, 'eval_precision': 0.9148073022312373, 'eval_recall': 0.9719827586206896, 'eval_f5': 0.9696518647151244, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 0.3333333333333333, 'eval_f5-USERNAME': 0.34210526315789475, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.9200913242009132, 'eval_r-NAME_STUDENT': 0.9734299516908212, 'eval_f5-NAME_STUDENT': 0.9712643678160919, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.332, 'eval_samples_per_second': 33.03, 'eval_steps_per_second': 33.03, 'epoch': 1.19}
{'loss': 0.0015, 'learning_rate': 2.104221635883905e-05, 'epoch': 1.22}
{'loss': 0.0015, 'learning_rate': 2

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0008599219145253301, 'eval_precision': 0.7170111287758346, 'eval_recall': 0.9719827586206896, 'eval_f5': 0.9588682639627117, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.46153846153846156, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9570552147239264, 'eval_p-NAME_STUDENT': 0.7075306479859895, 'eval_r-NAME_STUDENT': 0.9758454106280193, 'eval_f5-NAME_STUDENT': 0.9618166834538963, 'eval_p-URL_PERSONAL': 0.8571428571428571, 'eval_r-URL_PERSONAL': 0.8888888888888888, 'eval_f5-URL_PERSONAL': 0.8876244665718349, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 35.0196, 'eval_samples_per_second': 32.382, 'eval_steps_per_second': 32.382, 'epoch': 1.33}
{'loss': 0.0022, 'learning_rate': 2.0217678100263853e-05, 'epoch': 1.36}
{'loss': 0.0011,

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.000976880663074553, 'eval_precision': 0.742998352553542, 'eval_recall': 0.9719827586206896, 'eval_f5': 0.9605963791267305, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.46153846153846156, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9570552147239264, 'eval_p-NAME_STUDENT': 0.7425925925925926, 'eval_r-NAME_STUDENT': 0.9685990338164251, 'eval_f5-NAME_STUDENT': 0.9573921028466483, 'eval_p-URL_PERSONAL': 0.7297297297297297, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9859550561797753, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.6451, 'eval_samples_per_second': 32.732, 'eval_steps_per_second': 32.732, 'epoch': 1.48}
{'loss': 0.0008, 'learning_rate': 1.9393139841688653e-05, 'epoch': 1.51}
{'loss': 0.0008, 'learning_rate':

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.000808916287496686, 'eval_precision': 0.7417763157894737, 'eval_recall': 0.9719827586206896, 'eval_f5': 0.9605176933158583, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 0.75, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 0.9873417721518988, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.5, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9629629629629629, 'eval_p-NAME_STUDENT': 0.7371323529411765, 'eval_r-NAME_STUDENT': 0.9685990338164251, 'eval_f5-NAME_STUDENT': 0.9570405727923629, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.9677, 'eval_samples_per_second': 32.43, 'eval_steps_per_second': 32.43, 'epoch': 1.63}
{'loss': 0.0008, 'learning_rate': 1.8568601583113456e-05, 'epoch': 1.66}
{'loss': 0.0015, 'learning_rate': 

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0007353064138442278, 'eval_precision': 0.798941798941799, 'eval_recall': 0.9762931034482759, 'eval_f5': 0.9680282731979944, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.7901960784313725, 'eval_r-NAME_STUDENT': 0.9734299516908212, 'eval_f5-NAME_STUDENT': 0.9648250460405157, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.6993, 'eval_samples_per_second': 32.681, 'eval_steps_per_second': 32.681, 'epoch': 1.78}
{'loss': 0.0004, 'learning_rate': 1.774406332453826e-05, 'epoch': 1.81}
{'loss': 0.0013, 'learning_rate': 1.757915567282322e-05, 'epoch'

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0009350383770652115, 'eval_precision': 0.7727272727272727, 'eval_recall': 0.9892241379310345, 'eval_f5': 0.9786780383795308, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9936305732484076, 'eval_p-NAME_STUDENT': 0.7644859813084112, 'eval_r-NAME_STUDENT': 0.9879227053140096, 'eval_f5-NAME_STUDENT': 0.9769407441433166, 'eval_p-URL_PERSONAL': 0.7714285714285715, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9887323943661971, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 35.1035, 'eval_samples_per_second': 32.304, 'eval_steps_per_second': 32.304, 'epoch': 1.93}
{'loss': 0.0019, 'learning_rate': 1.691952506596306e-05, 'epoch': 1.96}
{'loss': 0.001, 'learning_rate': 

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0003761273983400315, 'eval_precision': 0.8947368421052632, 'eval_recall': 0.9892241379310345, 'eval_f5': 0.9852224882357797, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.8891304347826087, 'eval_r-NAME_STUDENT': 0.9879227053140096, 'eval_f5-NAME_STUDENT': 0.9837187789084182, 'eval_p-URL_PERSONAL': 0.9, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9957446808510639, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.4292, 'eval_samples_per_second': 32.937, 'eval_steps_per_second': 32.937, 'epoch': 2.08}
{'loss': 0.0006, 'learning_rate': 1.6094986807387864e-05, 'epoch': 2.11}
{'loss': 0.0014, 'learning_rate': 1.5930079155672825e-05, 'epoch': 2.14}
{'lo

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.00039824325358495116, 'eval_precision': 0.8798449612403101, 'eval_recall': 0.978448275862069, 'eval_f5': 0.9742489270386266, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.375, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9397590361445783, 'eval_p-NAME_STUDENT': 0.8977777777777778, 'eval_r-NAME_STUDENT': 0.9758454106280193, 'eval_f5-NAME_STUDENT': 0.9725925925925927, 'eval_p-URL_PERSONAL': 0.8181818181818182, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9915254237288135, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.8699, 'eval_samples_per_second': 32.521, 'eval_steps_per_second': 32.521, 'epoch': 2.22}
{'loss': 0.0003, 'learning_rate': 1.5270448548812667e-05, 'epoch': 2.25}
{'loss': 0.0009, 'learning_rate': 1.510554089

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.000348295463481918, 'eval_precision': 0.905811623246493, 'eval_recall': 0.9741379310344828, 'eval_f5': 0.971319943797008, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.42857142857142855, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.951219512195122, 'eval_p-NAME_STUDENT': 0.9241379310344827, 'eval_r-NAME_STUDENT': 0.9710144927536232, 'eval_f5-NAME_STUDENT': 0.9691237830319888, 'eval_p-URL_PERSONAL': 0.8181818181818182, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9915254237288135, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.8939, 'eval_samples_per_second': 32.499, 'eval_steps_per_second': 32.499, 'epoch': 2.37}
{'loss': 0.0007, 'learning_rate': 1.4445910290237468e-05, 'epoch': 2.4}
{'loss': 0.0004, 'learning_rate': 1.

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0004568410513456911, 'eval_precision': 0.8919449901768173, 'eval_recall': 0.978448275862069, 'eval_f5': 0.9748121232141383, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.46153846153846156, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9570552147239264, 'eval_p-NAME_STUDENT': 0.9078651685393259, 'eval_r-NAME_STUDENT': 0.9758454106280193, 'eval_f5-NAME_STUDENT': 0.9730430754979159, 'eval_p-URL_PERSONAL': 0.7941176470588235, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9901269393511988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7801, 'eval_samples_per_second': 32.605, 'eval_steps_per_second': 32.605, 'epoch': 2.52}
{'loss': 0.0006, 'learning_rate': 1.362137203166227e-05, 'epoch': 2.55}
{'loss': 0.0003, 'learning_rate':

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0003943504998460412, 'eval_precision': 0.8669201520912547, 'eval_recall': 0.9827586206896551, 'eval_f5': 0.9777337951509153, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.4, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9454545454545454, 'eval_p-NAME_STUDENT': 0.8864628820960698, 'eval_r-NAME_STUDENT': 0.9806763285024155, 'eval_f5-NAME_STUDENT': 0.9766839378238341, 'eval_p-URL_PERSONAL': 0.75, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9873417721518988, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7396, 'eval_samples_per_second': 32.643, 'eval_steps_per_second': 32.643, 'epoch': 2.67}
{'loss': 0.0002, 'learning_rate': 1.2796833773087072e-05, 'epoch': 2.7}
{'loss': 0.0012, 'learning_rate': 1.2631926121372032e-05, 'epo

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0002964325831271708, 'eval_precision': 0.9085487077534792, 'eval_recall': 0.9849137931034483, 'eval_f5': 0.9817400644468314, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.4, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9454545454545454, 'eval_p-NAME_STUDENT': 0.9292237442922374, 'eval_r-NAME_STUDENT': 0.9830917874396136, 'eval_f5-NAME_STUDENT': 0.9809047089358547, 'eval_p-URL_PERSONAL': 0.8181818181818182, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9915254237288135, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7519, 'eval_samples_per_second': 32.631, 'eval_steps_per_second': 32.631, 'epoch': 2.82}
{'loss': 0.0003, 'learning_rate': 1.1972295514511873e-05, 'epoch': 2.85}
{'loss': 0.0003, 'learning_rate': 1.18073878627

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.00032708930666558444, 'eval_precision': 0.8888888888888888, 'eval_recall': 1.0, 'eval_f5': 0.9952153110047847, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.8808510638297873, 'eval_r-NAME_STUDENT': 1.0, 'eval_f5-NAME_STUDENT': 0.9948243992606285, 'eval_p-URL_PERSONAL': 0.9310344827586207, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9971590909090909, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7088, 'eval_samples_per_second': 32.672, 'eval_steps_per_second': 32.672, 'epoch': 2.97}
{'loss': 0.0002, 'learning_rate': 1.1147757255936676e-05, 'epoch': 3.0}
{'loss': 0.0003, 'learning_rate': 1.0982849604221636e-05, 'epoch': 3.03}
{'loss': 0.0002, 'l

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.00033546617487445474, 'eval_precision': 0.9252525252525252, 'eval_recall': 0.9870689655172413, 'eval_f5': 0.9845390657296402, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.917960088691796, 'eval_r-NAME_STUDENT': 1.0, 'eval_f5-NAME_STUDENT': 0.9965743912600685, 'eval_p-URL_PERSONAL': 1.0, 'eval_r-URL_PERSONAL': 0.7777777777777778, 'eval_f5-URL_PERSONAL': 0.7844827586206896, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 35.0955, 'eval_samples_per_second': 32.312, 'eval_steps_per_second': 32.312, 'epoch': 3.11}
{'loss': 0.0003, 'learning_rate': 1.0323218997361477e-05, 'epoch': 3.14}
{'loss': 0.0002, 'learning_rate': 1.0158311345646438e-05, 'epoch': 3.17}
{'lo

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0002648408408276737, 'eval_precision': 0.9188118811881189, 'eval_recall': 1.0, 'eval_f5': 0.9966129698471705, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.92, 'eval_r-NAME_STUDENT': 1.0, 'eval_f5-NAME_STUDENT': 0.9966666666666667, 'eval_p-URL_PERSONAL': 0.84375, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9929278642149929, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7188, 'eval_samples_per_second': 32.662, 'eval_steps_per_second': 32.662, 'epoch': 3.26}
{'loss': 0.0004, 'learning_rate': 9.49868073878628e-06, 'epoch': 3.29}
{'loss': 0.0004, 'learning_rate': 9.33377308707124e-06, 'epoch': 3.32}
{'loss': 0.0004, 'learning_rate': 9.168865435356

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.00038401922211050987, 'eval_precision': 0.8738229755178908, 'eval_recall': 1.0, 'eval_f5': 0.9944769598549171, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 0.8571428571428571, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 0.9936305732484076, 'eval_p-NAME_STUDENT': 0.8734177215189873, 'eval_r-NAME_STUDENT': 1.0, 'eval_f5-NAME_STUDENT': 0.9944567627494456, 'eval_p-URL_PERSONAL': 0.8181818181818182, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9915254237288135, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.797, 'eval_samples_per_second': 32.589, 'eval_steps_per_second': 32.589, 'epoch': 3.41}
{'loss': 0.0002, 'learning_rate': 8.674142480211083e-06, 'epoch': 3.44}
{'loss': 0.0002, 'learning_rate': 8.509234828496044e-06, 'epoch

  0%|          | 0/1134 [00:00<?, ?it/s]

{'eval_loss': 0.0003192301082890481, 'eval_precision': 0.8838095238095238, 'eval_recall': 1.0, 'eval_f5': 0.9949690721649485, 'eval_p-STREET_ADDRESS': 0.0, 'eval_r-STREET_ADDRESS': 0.0, 'eval_f5-STREET_ADDRESS': 0.0, 'eval_p-USERNAME': 1.0, 'eval_r-USERNAME': 1.0, 'eval_f5-USERNAME': 1.0, 'eval_p-EMAIL': 1.0, 'eval_r-EMAIL': 1.0, 'eval_f5-EMAIL': 1.0, 'eval_p-ID_NUM': 1.0, 'eval_r-ID_NUM': 1.0, 'eval_f5-ID_NUM': 1.0, 'eval_p-NAME_STUDENT': 0.8808510638297873, 'eval_r-NAME_STUDENT': 1.0, 'eval_f5-NAME_STUDENT': 0.9948243992606285, 'eval_p-URL_PERSONAL': 0.84375, 'eval_r-URL_PERSONAL': 1.0, 'eval_f5-URL_PERSONAL': 0.9929278642149929, 'eval_p-PHONE_NUM': 1.0, 'eval_r-PHONE_NUM': 1.0, 'eval_f5-PHONE_NUM': 1.0, 'eval_runtime': 34.7939, 'eval_samples_per_second': 32.592, 'eval_steps_per_second': 32.592, 'epoch': 3.56}


In [None]:

# train_ds[70]['length']
for i in range(60, 75):
    print(train_ds[i]['length'])

595
928
916
520
813
703
1293
296
1555
880
459
601
781
682
1086


In [None]:
train_ds[69]['length']


880