# Training HF
>Tune the transformer pretrained model using the hugging face methods
#### In this notebook we will tune the pretrained transformer `distilbert-base-cased` using the method `trainer()` from hugging face. We will tune two models: System-A model `rise-ner-distilbert-base-cased-system-a-v1` which is the model tuned based on the whole set of NER tags and System-B model `rise-ner-distilbert-base-cased-system-b-v1`which is when the model tuned based on the selected list of NER tags `{0: 'O', 1: 'B-PER', 2: 'I-PER', 3: 'B-ORG', 4: 'I-ORG', 5: 'B-LOC', 6: 'I-LOC', 7: 'B-ANIM', 8: 'I-ANIM', 13: 'B-DIS', 14: 'I-DIS'}`. `v1` in the last part of the models name refers to the usgae of huggin face `trainer()` method for tuning the models, since `v2` refers to tuning the models using hugging face methods as in `RISE-NER-Final-Pytorch-Training.ipynb`

In [2]:
import os


os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

### we read the dataset of [MULTINER](https://huggingface.co/datasets/Babelscape/multinerd) from hugging face. which contains three sets: `train dataset`, `validation dataset`,'test dataset'. To tune the model, we will use the training set for training and test set for validate the fine tuned model

In [3]:
import datasets

multinerd = datasets.load_dataset("Babelscape/multinerd")
multinerd

Resolving data files:   0%|          | 0/20 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/20 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/20 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['tokens', 'ner_tags', 'lang'],
        num_rows: 2678400
    })
    validation: Dataset({
        features: ['tokens', 'ner_tags', 'lang'],
        num_rows: 334800
    })
    test: Dataset({
        features: ['tokens', 'ner_tags', 'lang'],
        num_rows: 335986
    })
})

### We will exclude all non English samples from the `train` and `test` dataset

In [4]:
# Filter the dataset to keep only the English language samples
en_train = multinerd['train'].filter(lambda example: example['lang'] == 'en')
en_test = multinerd['test'].filter(lambda example: example['lang'] == 'en')

### this is the dictionary of `ner-tags` and the corresponding labels. This dictionary is form the hugging face multinerd dataset

In [5]:
ner_mapping = {
    0:"O",
    1:"B-PER",
    2:"I-PER",
    3:"B-ORG",
    4:"I-ORG",
    5:"B-LOC",
    6:"I-LOC",
    7:"B-ANIM",
    8:"I-ANIM",
    9:"B-BIO",
    10:"I-BIO",
    11:"B-CEL",
    12:"I-CEL",
    13:"B-DIS",
    14:"I-DIS",
    15:"B-EVE",
    16:"I-EVE",
    17:"B-FOOD",
    18:"I-FOOD",
    19:"B-INST",
    20:"I-INST",
    21:"B-MEDIA",
    22:"I-MEDIA",
    23:"B-MYTH",
    24:"I-MYTH",
    25:"B-PLANT",
    26:"I-PLANT",
    27:"B-TIME",
    28:"I-TIME",
    29:"B-VEHI",
    30:"I-VEHI",
  }

### We used the pretrained transformer `distilbert-base-cased` from the huggin face. Thus we first set the configuration of model and instantiate the `AutoTokenizer` of the model, then we define the `model` object. 
### The number of labels of the model is 31 since in this setting we will include all `ner-tags`

In [8]:
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModelForTokenClassification,DataCollatorForTokenClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name_or_path = "distilbert-base-cased"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
config = AutoConfig.from_pretrained(model_name_or_path, **{
    'id2label': ner_mapping,
    'label2id': {v:k for k, v in ner_mapping.items()}
})
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config).to(device)
print(f"config: {model.config}")
print(f"num_labels: {model.config.num_labels}")

2023-11-30 21:59:50.246874: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 21:59:50.450784: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-30 21:59:51.711763: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /apps/Common/software/code-server/4.9.1/lib:/apps/Common/software/CUDA/11.3.1/nvvm/lib6

config: DistilBertConfig {
  "_name_or_path": "distilbert-base-cased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "O",
    "1": "B-PER",
    "2": "I-PER",
    "3": "B-ORG",
    "4": "I-ORG",
    "5": "B-LOC",
    "6": "I-LOC",
    "7": "B-ANIM",
    "8": "I-ANIM",
    "9": "B-BIO",
    "10": "I-BIO",
    "11": "B-CEL",
    "12": "I-CEL",
    "13": "B-DIS",
    "14": "I-DIS",
    "15": "B-EVE",
    "16": "I-EVE",
    "17": "B-FOOD",
    "18": "I-FOOD",
    "19": "B-INST",
    "20": "I-INST",
    "21": "B-MEDIA",
    "22": "I-MEDIA",
    "23": "B-MYTH",
    "24": "I-MYTH",
    "25": "B-PLANT",
    "26": "I-PLANT",
    "27": "B-TIME",
    "28": "I-TIME",
    "29": "B-VEHI",
    "30": "I-VEHI"
  },
  "initializer_range": 0.02,
  "label2id": {
    "B-ANIM": 7,
    "B-BIO": 9,
    "B-CEL": 11,
    "B-DIS": 13,
    "B-EVE": 15,
    "B-FOOD": 17,
    "B-

### The tokenizer adds two special tokens `['CLS'], ['SEP']`(see below example), thus the number of token will be more than the number of corresponding labels of `ner-tags`. To handle this issue we will assign `-100` for the special tokens, which eventually will be ignored by torch. 
### Morover, some words can be divided into sub-words (e.g. `'Lusiardo'` is splitted into `'Lu'`, `'##sia'`, `'##rdo'`). Thus we have to handle this sub words tokens tags. Thus, according to IOB, the main word (root) token will be assigned with `B-tag` like `B-PER`, and the derived sub-words tokens will be tagged with `I-tag` like `I-PER` (in the below example all the sub-words of `'Lusiardo'` got the tags `'B-PER'` since this is the surname where the first get the `'I-PER'`). Please see the example in `RISE-NER-Final-HF-Inference.ipynb`

### Thus The below function `tokenize_and_align_labels` does 3 jobs

1. Set –100 as the label for these special tokens and the subwords to mask during training
2. Mask the subword representations after the first subword with `I-tag`. If we set `label_all_tokens` attribute to False, then the sub-word tokens will be assigned to -100. However, for this experiment we set `label_all_tokens`to the default value of `True` to tag each sub-word token after the first sub-word to `I-tag`
3. Handle the selection of specific tags by `labels_need_to_keep` attribute. The samples for the excluded tags will remain in the dataset but we will assign a zero tag label. 


### Then the labels are aligned  with the token ids using the picked strategy:

In [6]:
def tokenize_and_align_labels(examples, label2id, id2label, label_all_tokens=True, labels_need_to_keep=None):
    labels_need_to_keep = [] if not isinstance(labels_need_to_keep, list) else labels_need_to_keep
    
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
    labels = []
    for i, label in enumerate(examples["ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            if len(labels_need_to_keep) > 0:
                if word_idx and label[word_idx] in labels_need_to_keep:
                    must_be_kept = True
                else:
                    must_be_kept = False
            else:
                must_be_kept = True

            if word_idx is None:
                # Set –100 for special tokens
                label_ids.append(-100)
            elif word_idx != previous_word_idx:
                # For the first subword token, use the original label
                label_ids.append(label[word_idx] if must_be_kept else 0)
            else:   
                # For subsequent subword tokens
                if label_all_tokens:
                    # Change B- tags to I- tags for subword tokens
                    original_label = id2label[label[word_idx]]
                    if original_label.startswith("B-"):
                        subword_label = "I" + original_label[1:]  # Change B- to I-
                        label_ids.append(label2id[subword_label] if must_be_kept else 0)
                    else:
                        label_ids.append(label[word_idx] if must_be_kept else 0)
                else:
                    label_ids.append(-100)  # Ignore subword tokens

            previous_word_idx = word_idx
        labels.append(label_ids)
    tokenized_inputs["labels"] = labels
    return tokenized_inputs

In [10]:
sample = en_train.select([2])
print(sample[0])
print("----")
ds_sample = sample.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label), batched=True)
print(ds_sample[0])
print("----")
print([(tokenizer.decode(id), lbl) for id, lbl in zip(ds_sample[0]["input_ids"], ds_sample[0]["labels"])])

{'tokens': ['The', 'film', 'starred', 'Tito', 'Lusiardo', 'and', 'a', '19-year', '-', 'old', 'Amelia', 'Bence', '.'], 'ner_tags': [0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 0], 'lang': 'en'}
----


Map:   0%|          | 0/1 [00:00<?, ? examples/s]

{'tokens': ['The', 'film', 'starred', 'Tito', 'Lusiardo', 'and', 'a', '19-year', '-', 'old', 'Amelia', 'Bence', '.'], 'ner_tags': [0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 0], 'lang': 'en', 'input_ids': [101, 1109, 1273, 4950, 22754, 14557, 6370, 16525, 1105, 170, 1627, 118, 1214, 118, 1385, 11691, 3096, 2093, 119, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [-100, 0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, -100]}
----
[('[CLS]', -100), ('The', 0), ('film', 0), ('starred', 0), ('Tito', 1), ('Lu', 2), ('##sia', 2), ('##rdo', 2), ('and', 0), ('a', 0), ('19', 0), ('-', 0), ('year', 0), ('-', 0), ('old', 0), ('Amelia', 1), ('Ben', 2), ('##ce', 2), ('.', 0), ('[SEP]', -100)]


### Now we will apply the function `tokenize_and_align_labels` on the `training` and `testing` datasets.

In [11]:
train_dataset = en_train.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label), batched=True)
test_dataset = en_test.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label), batched=True)

Map:   0%|          | 0/262560 [00:00<?, ? examples/s]

Map:   0%|          | 0/32908 [00:00<?, ? examples/s]

### To evaluate the model we use seqeval metric from huggin face. The seqeval package expects the predictions and labels as lists of lists, with each list corresponding to a single example in test sets. To integrate these metrics during training, we need a function that can take the outputs of the model and convert them into the lists that seqeval expects. The following does the trick by ensuring we ignore the label IDs associated with subsequent subwords:

### Thus we define the function `Compute Metrics`

This compute_metrics() function first takes the argmax of the logits to convert them to predictions (as usual, the logits and the probabilities are in the same order, so we don’t need to apply the softmax). Then we have to convert both labels and predictions from integers to strings. We remove all the values where the label is -100, then pass the results to the metric.compute() method. The function computes precision, recall, F1 score and accuracy.

```

Parameters:
    eval_preds (tuple): A tuple containing the predicted logits and the true labels.

Returns:
    A dictionary containing the precision, recall, F1 score and accuracy.
```

In [12]:
import numpy as np

metric = datasets.load_metric("seqeval")
label_list = model.config.id2label

def compute_metrics(eval_preds):
    pred_logits, labels = eval_preds

    # Apply softmax and then argmax on the last dimension to get the predicted label IDs
    pred_logits = np.argmax(pred_logits, axis=-1)

    predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(pred_logits, labels)
    ]

    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(pred_logits, labels)
    ]

    results = metric.compute(predictions=predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

  metric = datasets.load_metric("seqeval")


### now we will define all the training configurations that we will pass to the `trainer()`. The `evaluation_strategy` is set to `epoch`means that the model is evaluated at the end of each epoch. This is useful for tracking the model's performance over time and for early stopping. The batch size is 32 and we tune the model for 2 epochs

In [13]:
# defining the training argument parameters
from transformers import DataCollatorForTokenClassification, TrainingArguments, Trainer


args = TrainingArguments(
  "rise-ner-distilbert-base-cased-system-a-v1",
  evaluation_strategy="epoch", 
  learning_rate=2e-5,
  per_device_train_batch_size=32,
  per_device_eval_batch_size=32,
  num_train_epochs=2,
  weight_decay=0.01,
)

data_collator = DataCollatorForTokenClassification(tokenizer)

now we pass the training arguments and datasets, beside the model and tokenizer, as well `data_collator` (used for padding) to the trainer objec. Then we train using `trainer.train()`

In [14]:
trainer = Trainer(
    model,
    args,
    train_dataset= train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.t±rain()

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.0462,0.047967,0.913047,0.941743,0.927173,0.981788
2,0.0262,0.052203,0.919892,0.947046,0.933272,0.982577


TrainOutput(global_step=16410, training_loss=0.05259433179131228, metrics={'train_runtime': 1867.5615, 'train_samples_per_second': 281.179, 'train_steps_per_second': 8.787, 'total_flos': 8930287549116480.0, 'train_loss': 0.05259433179131228, 'epoch': 2.0})

### The whole traing and validation process took 31 minutes. That is due to the big size of dataset and the complexity of the model. 
### if we look at the model we see a slight improvement in the second epoch regarding the f1-score however, the `validation loss` for the first epoch is better with marginal difference. 

>Now we save the model and upload it to the hugging face repository. Since the model is trained using hugging face, we add `-v1` at the end of the model name. 

In [17]:
from huggingface_hub import HfApi, create_repo

model_path = "rise-ner-distilbert-base-cased-system-a-v1"
repo_id = f"petersamoaa/{model_path}"

model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

api = HfApi()
create_repo(repo_id, repo_type="model", private=True)
api.upload_folder(
    folder_path=model_path,
    repo_id=repo_id,
    repo_type="model"
)

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Upload 162 LFS files:   0%|          | 0/162 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

events.out.tfevents.1701378073.alvis3-20.508280.0:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

'https://huggingface.co/petersamoaa/rise-ner-distilbert-base-cased-system-a-v1/tree/main/'

### System_B

In [7]:
ner_mapping = {
    0:"O",
    1:"B-PER",
    2:"I-PER",
    3:"B-ORG",
    4:"I-ORG",
    5:"B-LOC",
    6:"I-LOC",
    7:"B-ANIM",
    8:"I-ANIM",
    9:"B-BIO",
    10:"I-BIO",
    11:"B-CEL",
    12:"I-CEL",
    13:"B-DIS",
    14:"I-DIS",
    15:"B-EVE",
    16:"I-EVE",
    17:"B-FOOD",
    18:"I-FOOD",
    19:"B-INST",
    20:"I-INST",
    21:"B-MEDIA",
    22:"I-MEDIA",
    23:"B-MYTH",
    24:"I-MYTH",
    25:"B-PLANT",
    26:"I-PLANT",
    27:"B-TIME",
    28:"I-TIME",
    29:"B-VEHI",
    30:"I-VEHI",
  }

labels_to_pick = ["PER", "ORG", "LOC", "DIS", "ANIM", "O"]
new_ner_mapping = {v: k for v, k in ner_mapping.items() if k.split("-")[-1] in labels_to_pick}
new_ner_mapping

{0: 'O',
 1: 'B-PER',
 2: 'I-PER',
 3: 'B-ORG',
 4: 'I-ORG',
 5: 'B-LOC',
 6: 'I-LOC',
 7: 'B-ANIM',
 8: 'I-ANIM',
 13: 'B-DIS',
 14: 'I-DIS'}

In [8]:
import torch
from transformers import AutoTokenizer, AutoConfig, AutoModelForTokenClassification,DataCollatorForTokenClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name_or_path = "distilbert-base-cased"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
config = AutoConfig.from_pretrained(model_name_or_path, **{
    'id2label': ner_mapping,
    'label2id': {v:k for k, v in ner_mapping.items()}
})
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path, config=config).to(device)
print(f"config: {model.config}")
print(f"num_labels: {model.config.num_labels}")

2023-12-01 00:08:00.988147: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-01 00:08:01.302373: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-01 00:08:02.601189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /apps/Common/software/code-server/4.9.1/lib:/apps/Common/software/CUDA/11.3.1/nvvm/lib6

config: DistilBertConfig {
  "_name_or_path": "distilbert-base-cased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "O",
    "1": "B-PER",
    "2": "I-PER",
    "3": "B-ORG",
    "4": "I-ORG",
    "5": "B-LOC",
    "6": "I-LOC",
    "7": "B-ANIM",
    "8": "I-ANIM",
    "9": "B-BIO",
    "10": "I-BIO",
    "11": "B-CEL",
    "12": "I-CEL",
    "13": "B-DIS",
    "14": "I-DIS",
    "15": "B-EVE",
    "16": "I-EVE",
    "17": "B-FOOD",
    "18": "I-FOOD",
    "19": "B-INST",
    "20": "I-INST",
    "21": "B-MEDIA",
    "22": "I-MEDIA",
    "23": "B-MYTH",
    "24": "I-MYTH",
    "25": "B-PLANT",
    "26": "I-PLANT",
    "27": "B-TIME",
    "28": "I-TIME",
    "29": "B-VEHI",
    "30": "I-VEHI"
  },
  "initializer_range": 0.02,
  "label2id": {
    "B-ANIM": 7,
    "B-BIO": 9,
    "B-CEL": 11,
    "B-DIS": 13,
    "B-EVE": 15,
    "B-FOOD": 17,
    "B-

In [10]:
sample = en_train.select([149])
print(sample[0])
print("----")
ds_sample = sample.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label, labels_need_to_keep=list(new_ner_mapping.keys())), batched=True)
print(ds_sample[0])
print("----")
# print([(tokenizer.decode(id), lbl) for id, lbl in zip(ds_sample[0]["input_ids"], ds_sample[0]["labels"])])

{'tokens': ['It', 'is', 'often', 'marinated', 'with', 'garlic', ',', 'and', 'accompanied', 'by', 'soju', '.'], 'ner_tags': [0, 0, 0, 0, 0, 25, 0, 0, 0, 0, 17, 0], 'lang': 'en'}
----


Map:   0%|          | 0/1 [00:00<?, ? examples/s]

{'tokens': ['It', 'is', 'often', 'marinated', 'with', 'garlic', ',', 'and', 'accompanied', 'by', 'soju', '.'], 'ner_tags': [0, 0, 0, 0, 0, 25, 0, 0, 0, 0, 17, 0], 'lang': 'en', 'input_ids': [101, 1135, 1110, 1510, 12477, 9324, 1906, 1114, 24861, 117, 1105, 4977, 1118, 1177, 9380, 119, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [-100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -100]}
----


In [11]:
train_dataset = en_train.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label, labels_need_to_keep=list(new_ner_mapping.keys())), batched=True)
test_dataset = en_test.map(lambda x: tokenize_and_align_labels(x, label2id=model.config.label2id, id2label=model.config.id2label, labels_need_to_keep=list(new_ner_mapping.keys())), batched=True)

Map:   0%|          | 0/262560 [00:00<?, ? examples/s]

Map:   0%|          | 0/32908 [00:00<?, ? examples/s]

In [12]:
import numpy as np

metric = datasets.load_metric("seqeval")
label_list = model.config.id2label

def compute_metrics(eval_preds):
    pred_logits, labels = eval_preds

    # Apply softmax and then argmax on the last dimension to get the predicted label IDs
    pred_logits = np.argmax(pred_logits, axis=-1)

    predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(pred_logits, labels)
    ]

    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(pred_logits, labels)
    ]

    results = metric.compute(predictions=predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

  metric = datasets.load_metric("seqeval")


In [13]:
# defining the training argument parameters
from transformers import DataCollatorForTokenClassification, TrainingArguments, Trainer


args = TrainingArguments(
  "rise-ner-distilbert-base-cased-system-b-v1",
  evaluation_strategy="epoch", #The model is evaluated at the end of each epoch. This is useful for tracking the model's performance over time and for early stopping.
  learning_rate=2e-5,
  per_device_train_batch_size=32,
  per_device_eval_batch_size=32,
  num_train_epochs=2,
  weight_decay=0.01,
)

data_collator = DataCollatorForTokenClassification(tokenizer)

In [14]:
trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()

Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.0238,0.029933,0.943047,0.966368,0.954565,0.98862
2,0.0116,0.033485,0.952251,0.967158,0.959647,0.989319


TrainOutput(global_step=16410, training_loss=0.028868427718288646, metrics={'train_runtime': 1891.1059, 'train_samples_per_second': 277.679, 'train_steps_per_second': 8.677, 'total_flos': 8930287549116480.0, 'train_loss': 0.028868427718288646, 'epoch': 2.0})

>The copied results from System A
```bash
Epoch   Training Loss   Validation Loss Precision	Recall	        F1	        Accuracy
    1	0.046200	    0.047967	    0.913047	0.941743	0.927173	0.981788
    2	0.026200	    0.052203	    0.919892	0.947046	0.933272	0.982577
```

#### based on results for both models, it seems that system B is some how better in terms of F1-score and `validation loss`. That explain that less labels can make the model more efficient despite the fact that assigning zero to the excluded class will increase the number of samples with zero which leads to class imbalancing.  

In [15]:
from huggingface_hub import HfApi, create_repo

model_path = "rise-ner-distilbert-base-cased-system-b-v1"
repo_id = f"petersamoaa/{model_path}"

model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

api = HfApi()
create_repo(repo_id, repo_type="model", private=True)
api.upload_folder(
    folder_path=model_path,
    repo_id=repo_id,
    repo_type="model"
)

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

Upload 162 LFS files:   0%|          | 0/162 [00:00<?, ?it/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

optimizer.pt:   0%|          | 0.00/522M [00:00<?, ?B/s]

rng_state.pth:   0%|          | 0.00/14.6k [00:00<?, ?B/s]

scheduler.pt:   0%|          | 0.00/627 [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

events.out.tfevents.1701385827.alvis3-20.527901.0:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

'https://huggingface.co/petersamoaa/rise-ner-distilbert-base-cased-system-b-v1/tree/main/'