# Masked Language Modeling

This notebook describes how one can pre-train their own AntiBERTa model using the HuggingFace framework. As a demo, we've included the tokenizer we've used, and 1% of the sequences that we used in our training, validation, and test sets of the paper.

## Setup of all the things we need

In [13]:
# Some imports 
from transformers import (
    RobertaConfig,
    RobertaTokenizer,
    RobertaForMaskedLM,
    DataCollatorForLanguageModeling,
    TrainingArguments,
    Trainer,
)
from datasets import load_dataset
import os

In [14]:
# Initialise the tokeniser
tokenizer = RobertaTokenizer.from_pretrained(
    "../antiberta/antibody-tokenizer"
)

# Initialise the data collator, which is necessary for batching
collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

loading file vocab.json
loading file merges.txt
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'BertTokenizer'. 
The class this function is called from is 'RobertaTokenizer'.


## Text Data preprocessing

In [15]:
text_datasets = {
    "train": ['../antiberta/assets/train-slice.txt'],
    "eval": ['../antiberta/assets/val-slice.txt'],
    "test": ['../antiberta/assets/test-slice.txt']
}

dataset = load_dataset("text", data_files=text_datasets)
tokenized_dataset = dataset.map(
    lambda z: tokenizer(
        z["text"],
        padding="max_length",
        truncation=True,
        max_length=150,
        return_special_tokens_mask=True,
    ),
    batched=True,
    num_proc=1,
    remove_columns=["text"],
)

Using custom data configuration default-4099dca2205c4257
Found cached dataset text (/Users/joseph/.cache/huggingface/datasets/text/default-4099dca2205c4257/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)


  0%|          | 0/3 [00:00<?, ?it/s]

Loading cached processed dataset at /Users/joseph/.cache/huggingface/datasets/text/default-4099dca2205c4257/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2/cache-5c6d18c0ea3c8c1f.arrow
Loading cached processed dataset at /Users/joseph/.cache/huggingface/datasets/text/default-4099dca2205c4257/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2/cache-85b0fa99574ce78c.arrow


  0%|          | 0/1 [00:00<?, ?ba/s]

## Model configuration

In [16]:
import torch
device = torch.device('mps')

In [17]:
# These are the cofigurations we've used for pre-training.
antiberta_config = {
    "num_hidden_layers": 12,
    "num_attention_heads": 12,
    "hidden_size": 768,
    "d_ff": 3072,
    "vocab_size": 25,
    "max_len": 150,
    "max_position_embeddings": 152,
    "batch_size": 96,
    "max_steps": 225000,
    "weight_decay": 0.01,
    "peak_learning_rate": 0.0001,
}

In [18]:
# Initialise the model
model_config = RobertaConfig(
    vocab_size=antiberta_config.get("vocab_size"),
    hidden_size=antiberta_config.get("hidden_size"),
    max_position_embeddings=antiberta_config.get("max_position_embeddings"),
    num_hidden_layers=antiberta_config.get("num_hidden_layers", 12),
    num_attention_heads=antiberta_config.get("num_attention_heads", 12),
    type_vocab_size=1,
)
model = RobertaForMaskedLM(model_config).to(device)

In [19]:
# construct training arguments
# Huggingface uses a default seed of 42
args = TrainingArguments(
    output_dir="test",
    overwrite_output_dir=True,
    per_device_train_batch_size=antiberta_config.get("batch_size", 32),
    per_device_eval_batch_size=antiberta_config.get("batch_size", 32),
    max_steps=225000,
    save_steps=2500,
    logging_steps=2500,
    adam_beta2=0.98,
    adam_epsilon=1e-6,
    weight_decay=0.01,
    warmup_steps=10000,
    learning_rate=1e-4,
    gradient_accumulation_steps=antiberta_config.get("gradient_accumulation_steps", 1),
    # fp16=True,
    evaluation_strategy="steps",
    seed=42
)

using `logging_steps` to initialize `eval_steps` to 2500
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


## Setup of the HuggingFace Trainer

In [20]:
trainer = Trainer(
    model=model,
    args=args,
    data_collator=collator,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["eval"]
)

max_steps is given, it will override any value given in num_train_epochs


In [21]:
trainer.train(resume_from_checkpoint = True)

Loading model from test/checkpoint-150000.
The following columns in the training set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3
  Num Epochs = 225000
  Instantaneous batch size per device = 96
  Total train batch size (w. parallel, distributed & accumulation) = 96
  Gradient Accumulation steps = 1
  Total optimization steps = 225000
  Number of trainable parameters = 85784857
  Continuing training from checkpoint, will skip to saved global_step
  Continuing training from epoch 150000
  Continuing training from global step 150000
  Will skip the first 150000 epochs then the first 0 batches in the first epoch. If this takes a lot of time, you can add the `--ignore_data_skip` flag to your launch command, but you will resume the training on data already seen by your 

0it [00:00, ?it/s]

  0%|          | 0/225000 [00:00<?, ?it/s]

The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 3.372093023255814e-05, 'epoch': 152500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-152500
Configuration saved in test/checkpoint-152500/config.json


{'eval_loss': nan, 'eval_runtime': 0.2029, 'eval_samples_per_second': 14.782, 'eval_steps_per_second': 4.927, 'epoch': 152500.0}


Model weights saved in test/checkpoint-152500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 3.2558139534883724e-05, 'epoch': 155000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-155000
Configuration saved in test/checkpoint-155000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1691, 'eval_samples_per_second': 17.741, 'eval_steps_per_second': 5.914, 'epoch': 155000.0}


Model weights saved in test/checkpoint-155000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 3.13953488372093e-05, 'epoch': 157500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-157500
Configuration saved in test/checkpoint-157500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1993, 'eval_samples_per_second': 15.056, 'eval_steps_per_second': 5.019, 'epoch': 157500.0}


Model weights saved in test/checkpoint-157500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 3.0232558139534883e-05, 'epoch': 160000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-160000
Configuration saved in test/checkpoint-160000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1768, 'eval_samples_per_second': 16.968, 'eval_steps_per_second': 5.656, 'epoch': 160000.0}


Model weights saved in test/checkpoint-160000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.9069767441860467e-05, 'epoch': 162500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-162500
Configuration saved in test/checkpoint-162500/config.json


{'eval_loss': nan, 'eval_runtime': 0.162, 'eval_samples_per_second': 18.513, 'eval_steps_per_second': 6.171, 'epoch': 162500.0}


Model weights saved in test/checkpoint-162500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.7906976744186048e-05, 'epoch': 165000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-165000
Configuration saved in test/checkpoint-165000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1565, 'eval_samples_per_second': 19.169, 'eval_steps_per_second': 6.39, 'epoch': 165000.0}


Model weights saved in test/checkpoint-165000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.674418604651163e-05, 'epoch': 167500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-167500
Configuration saved in test/checkpoint-167500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1747, 'eval_samples_per_second': 17.175, 'eval_steps_per_second': 5.725, 'epoch': 167500.0}


Model weights saved in test/checkpoint-167500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.5581395348837212e-05, 'epoch': 170000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-170000
Configuration saved in test/checkpoint-170000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1535, 'eval_samples_per_second': 19.543, 'eval_steps_per_second': 6.514, 'epoch': 170000.0}


Model weights saved in test/checkpoint-170000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.441860465116279e-05, 'epoch': 172500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-172500
Configuration saved in test/checkpoint-172500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1694, 'eval_samples_per_second': 17.713, 'eval_steps_per_second': 5.904, 'epoch': 172500.0}


Model weights saved in test/checkpoint-172500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.3255813953488374e-05, 'epoch': 175000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

{'eval_loss': nan, 'eval_runtime': 0.1976, 'eval_samples_per_second': 15.184, 'eval_steps_per_second': 5.061, 'epoch': 175000.0}

Saving model checkpoint to test/checkpoint-175000
Configuration saved in test/checkpoint-175000/config.json





Model weights saved in test/checkpoint-175000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.2093023255813955e-05, 'epoch': 177500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-177500
Configuration saved in test/checkpoint-177500/config.json


{'eval_loss': nan, 'eval_runtime': 0.2014, 'eval_samples_per_second': 14.899, 'eval_steps_per_second': 4.966, 'epoch': 177500.0}


Model weights saved in test/checkpoint-177500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 2.0930232558139536e-05, 'epoch': 180000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-180000
Configuration saved in test/checkpoint-180000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1582, 'eval_samples_per_second': 18.968, 'eval_steps_per_second': 6.323, 'epoch': 180000.0}


Model weights saved in test/checkpoint-180000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.9767441860465116e-05, 'epoch': 182500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-182500
Configuration saved in test/checkpoint-182500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1807, 'eval_samples_per_second': 16.598, 'eval_steps_per_second': 5.533, 'epoch': 182500.0}


Model weights saved in test/checkpoint-182500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.8604651162790697e-05, 'epoch': 185000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-185000
Configuration saved in test/checkpoint-185000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1675, 'eval_samples_per_second': 17.915, 'eval_steps_per_second': 5.972, 'epoch': 185000.0}


Model weights saved in test/checkpoint-185000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.744186046511628e-05, 'epoch': 187500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-187500
Configuration saved in test/checkpoint-187500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1695, 'eval_samples_per_second': 17.697, 'eval_steps_per_second': 5.899, 'epoch': 187500.0}


Model weights saved in test/checkpoint-187500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.6279069767441862e-05, 'epoch': 190000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-190000
Configuration saved in test/checkpoint-190000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1674, 'eval_samples_per_second': 17.921, 'eval_steps_per_second': 5.974, 'epoch': 190000.0}


Model weights saved in test/checkpoint-190000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.5116279069767441e-05, 'epoch': 192500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-192500
Configuration saved in test/checkpoint-192500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1722, 'eval_samples_per_second': 17.424, 'eval_steps_per_second': 5.808, 'epoch': 192500.0}


Model weights saved in test/checkpoint-192500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.3953488372093024e-05, 'epoch': 195000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-195000
Configuration saved in test/checkpoint-195000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1594, 'eval_samples_per_second': 18.819, 'eval_steps_per_second': 6.273, 'epoch': 195000.0}


Model weights saved in test/checkpoint-195000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.2790697674418606e-05, 'epoch': 197500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-197500
Configuration saved in test/checkpoint-197500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1758, 'eval_samples_per_second': 17.063, 'eval_steps_per_second': 5.688, 'epoch': 197500.0}


Model weights saved in test/checkpoint-197500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.1627906976744187e-05, 'epoch': 200000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-200000
Configuration saved in test/checkpoint-200000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1808, 'eval_samples_per_second': 16.592, 'eval_steps_per_second': 5.531, 'epoch': 200000.0}


Model weights saved in test/checkpoint-200000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 1.0465116279069768e-05, 'epoch': 202500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-202500
Configuration saved in test/checkpoint-202500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1822, 'eval_samples_per_second': 16.469, 'eval_steps_per_second': 5.49, 'epoch': 202500.0}


Model weights saved in test/checkpoint-202500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 9.302325581395349e-06, 'epoch': 205000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-205000
Configuration saved in test/checkpoint-205000/config.json


{'eval_loss': nan, 'eval_runtime': 0.1794, 'eval_samples_per_second': 16.724, 'eval_steps_per_second': 5.575, 'epoch': 205000.0}


Model weights saved in test/checkpoint-205000/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 8.139534883720931e-06, 'epoch': 207500.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-207500
Configuration saved in test/checkpoint-207500/config.json


{'eval_loss': nan, 'eval_runtime': 0.1878, 'eval_samples_per_second': 15.976, 'eval_steps_per_second': 5.325, 'epoch': 207500.0}


Model weights saved in test/checkpoint-207500/pytorch_model.bin
The following columns in the evaluation set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: special_tokens_mask. If special_tokens_mask are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3
  Batch size = 96


{'loss': 0.0, 'learning_rate': 6.976744186046512e-06, 'epoch': 210000.0}


  0%|          | 0/1 [00:00<?, ?it/s]

Saving model checkpoint to test/checkpoint-210000
Configuration saved in test/checkpoint-210000/config.json


{'eval_loss': nan, 'eval_runtime': 0.176, 'eval_samples_per_second': 17.041, 'eval_steps_per_second': 5.68, 'epoch': 210000.0}


Model weights saved in test/checkpoint-210000/pytorch_model.bin


In [None]:
trainer.save_model('../antiberta/saved_model')

In [None]:
# Predict MLM performance on the test dataset
out = trainer.predict(tokenized_dataset['test'])