# Fine-tuning a LLM on your own docs

## Natural Language Processing

**Text classification**: the model is trained to predict a label for a given text. Text classification is frequently used for tasks like sentiment analysis, topic classification, and spam detection.

**Token classification**: the model is trained to predict a label for each token in the sequence. Token classification is frequently used for tasks like named entity recognition (NER), part-of-speech tagging, and chunking.

**Question answering**: the model is trained to predict an answer to a question based on a given context. Question answering is frequently used for tasks like question answering, fact verification, and conversational response generation.

**Causal language modeling**: the model is trained to predict the next token in the sequence. Causal language models are frequently used for text generation.

**Masked language modeling**: the model is trained to predict the masked tokens in the sequence. Masked language model can attend to tokens bidirectionally. This means the model has full access to the tokens on the left and right. Masked language modeling is great for tasks that require a good contextual understanding of an entire sequence. BERT is an example of a masked language model.

**Translation**: the model is trained to translate text from one language to another. Translation is frequently used for tasks like machine translation.

**Summarization**: the model is trained to summarize a given text. Summarization is frequently used for tasks like news summarization, article summarization, and book summarization.

**Multiple choice**: the model is trained to predict the correct answer from a list of multiple choice options. 

The goal of this post is to fine-tune a LLM on a custom dataset for causal language modeling. We will use the DistilGPT2 model. DistilGPT2 (short for Distilled-GPT2) is an English-language model pre-trained with the supervision of the smallest version of Generative Pre-trained Transformer 2 (GPT-2). Like GPT-2, DistilGPT2 can be used to generate text. Users of this model card should also consider information about the design, training, and limitations of GPT-2.


## Install libraries


In [None]:
%pip install -U transformers torch tqdm tiktoken markdown

## Load the data

Let's load the ELI5 dataset from the Hugging Face Hub first. Next, we split the dataset into a training and test set, train the model on the training set, evaluate the model on the test set, inference the model on a custom prompt.

Next, we fine-tune the model on our own dataset.

In [1]:
from datasets import load_dataset

eli5 = load_dataset("eli5_category", split="train[:5000]")

eli5 = eli5.train_test_split(test_size=0.2)

  from .autonotebook import tqdm as notebook_tqdm
Downloading data: 100%|██████████| 62.3M/62.3M [00:00<00:00, 66.7MB/s]
Downloading data: 100%|██████████| 5.00M/5.00M [00:00<00:00, 90.5MB/s]
Downloading data: 100%|██████████| 1.76M/1.76M [00:00<00:00, 110MB/s]
Downloading data: 100%|██████████| 3.85M/3.85M [00:00<00:00, 95.6MB/s]
Generating train split: 100%|██████████| 91772/91772 [00:03<00:00, 23245.99 examples/s]
Generating validation1 split: 100%|██████████| 5446/5446 [00:00<00:00, 26980.41 examples/s]
Generating validation2 split: 100%|██████████| 2375/2375 [00:00<00:00, 28917.25 examples/s]
Generating test split: 100%|██████████| 5411/5411 [00:00<00:00, 29139.00 examples/s]


## Preprocess

In [2]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")


eli5 = eli5.flatten()
eli5["train"][0]

{'q_id': '7h2ns1',
 'title': 'How does different data traveling on the same cable not get lost with all the other data?',
 'selftext': '',
 'category': 'Technology',
 'subreddit': 'explainlikeimfive',
 'answers.a_id': ['dqnmb02'],
 'answers.text': ['Data is encapsulated into packets with headers and trailers that identify it. Sometimes it does get lost though. When 2 devices establish a connection, they decide on packet numbering. If I send you a packet that says it contains data 1500 - 1600 you expect that my next packet starts with 1700. If it doesn’t, then you your response to me is essentially “I need 1700”.'],
 'answers.score': [10],
 'answers.text_urls': [[]],
 'title_urls': ['url'],
 'selftext_urls': ['url']}

In [3]:
def preprocess_function(examples):
    return tokenizer([" ".join(x) for x in examples["answers.text"]])

tokenized_eli5 = eli5.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=eli5["train"].column_names,
)

block_size = 128

def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= block_size:
        total_length = (total_length // block_size) * block_size
    # Split by chunks of block_size.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)

from transformers import DataCollatorForLanguageModeling

tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

Map (num_proc=4):   0%|          | 0/4000 [00:00<?, ? examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1209 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1186 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (2070 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1601 > 1024). Running this sequence through the model will result in indexing errors
Map (num_proc=4): 100%|██████████| 4000/4000 [00:00<00:00, 4995.93 examples/s]
Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]Token indices sequence length is longer than the specif

## Train

In [4]:
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")

training_args = TrainingArguments(
    output_dir="my_awesome_eli5_clm-model",
    eval_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_dataset["train"],
    eval_dataset=lm_dataset["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

trainer.train()

  trainer = Trainer(
 13%|█▎        | 500/3975 [04:26<30:20,  1.91it/s]

{'loss': 3.9787, 'grad_norm': 4.818574905395508, 'learning_rate': 1.748427672955975e-05, 'epoch': 0.38}


 25%|██▌       | 1000/3975 [08:52<25:56,  1.91it/s] 

{'loss': 3.9528, 'grad_norm': 3.8537325859069824, 'learning_rate': 1.4968553459119497e-05, 'epoch': 0.75}


                                                   
 33%|███▎      | 1325/3975 [12:26<23:20,  1.89it/s]

{'eval_loss': 3.8328182697296143, 'eval_runtime': 41.027, 'eval_samples_per_second': 60.497, 'eval_steps_per_second': 7.58, 'epoch': 1.0}


 38%|███▊      | 1500/3975 [14:00<22:34,  1.83it/s]  

{'loss': 3.913, 'grad_norm': 3.8344168663024902, 'learning_rate': 1.2452830188679246e-05, 'epoch': 1.13}


 50%|█████     | 2000/3975 [18:25<16:49,  1.96it/s]

{'loss': 3.8573, 'grad_norm': 4.064958572387695, 'learning_rate': 9.937106918238994e-06, 'epoch': 1.51}


 63%|██████▎   | 2500/3975 [22:48<13:05,  1.88it/s]

{'loss': 3.8496, 'grad_norm': 4.159275531768799, 'learning_rate': 7.421383647798742e-06, 'epoch': 1.89}


                                                   
 67%|██████▋   | 2650/3975 [24:51<11:36,  1.90it/s]

{'eval_loss': 3.823622941970825, 'eval_runtime': 41.3614, 'eval_samples_per_second': 60.008, 'eval_steps_per_second': 7.519, 'epoch': 2.0}


 75%|███████▌  | 3000/3975 [27:54<08:24,  1.93it/s]  

{'loss': 3.83, 'grad_norm': 3.96323823928833, 'learning_rate': 4.905660377358491e-06, 'epoch': 2.26}


 88%|████████▊ | 3500/3975 [32:22<04:15,  1.86it/s]

{'loss': 3.8126, 'grad_norm': 4.447389125823975, 'learning_rate': 2.389937106918239e-06, 'epoch': 2.64}


                                                   
100%|██████████| 3975/3975 [37:30<00:00,  1.77it/s]

{'eval_loss': 3.8222262859344482, 'eval_runtime': 52.5303, 'eval_samples_per_second': 47.249, 'eval_steps_per_second': 5.92, 'epoch': 3.0}
{'train_runtime': 2250.3542, 'train_samples_per_second': 14.131, 'train_steps_per_second': 1.766, 'train_loss': 3.8763028203616354, 'epoch': 3.0}





TrainOutput(global_step=3975, training_loss=3.8763028203616354, metrics={'train_runtime': 2250.3542, 'train_samples_per_second': 14.131, 'train_steps_per_second': 1.766, 'total_flos': 1038654583603200.0, 'train_loss': 3.8763028203616354, 'epoch': 3.0})

## Evaluate

In [5]:
import math

eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

100%|██████████| 311/311 [00:40<00:00,  7.62it/s]

Perplexity: 45.71





## Inference


In [8]:
prompt = "Somatic hypermutation allows the immune system to"

from transformers import pipeline

generator = pipeline("text-generation", model="my_awesome_eli5_clm-model")
generator(prompt)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'generated_text': 'Somatic hypermutation allows the immune system to adapt to a more drastic response. When the immune system is overwhelmed, the immune system does NOT adapt to new stimuli (an event) and does not respond to new stimuli.This technique is called hyp'}]

## Build our own dataset


In [5]:
from datasets import load_dataset
own_dataset = load_dataset("text", data_files={"train": ["llm-fine-tuning-docs/docs/README.md"]}, split="train")


In [6]:
own_dataset = own_dataset.train_test_split(test_size=0.2)

In [15]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")

def preprocess_function(examples):
    return tokenizer([" ".join(x) for x in examples["text"]])

tokenized_own_dataset = own_dataset.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=own_dataset["train"].column_names,
)

block_size = 128

def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    if total_length >= block_size:
        total_length = (total_length // block_size) * block_size
    # Split by chunks of block_size.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

lm_dataset = tokenized_own_dataset.map(group_texts, batched=True, num_proc=4)



Map (num_proc=4):  23%|██▎       | 25000/110255 [00:00<00:01, 47341.94 examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1265 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (2635 > 1024). Running this sequence through the model will result in indexing errors
Map (num_proc=4): 100%|██████████| 110255/110255 [00:02<00:00, 51796.27 examples/s]
Map (num_proc=4):  33%|███▎      | 9000/27564 [00:00<00:00, 33464.95 examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1434 > 1024). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1498 > 1024). Running this sequence through the model will result in indexing errors
Map (num_proc=4): 

In [16]:
from transformers import DataCollatorForLanguageModeling

tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [17]:
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")

training_args = TrainingArguments(
    output_dir="my_awesome_eli5_clm-model",
    eval_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_dataset["train"],
    eval_dataset=lm_dataset["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

trainer.train()

  trainer = Trainer(
  2%|▏         | 500/25236 [04:32<3:45:39,  1.83it/s]

{'loss': 1.0514, 'grad_norm': 2.923488140106201, 'learning_rate': 1.9603740687906166e-05, 'epoch': 0.06}


  4%|▍         | 1000/25236 [09:02<3:37:20,  1.86it/s]

{'loss': 0.8607, 'grad_norm': 3.653607130050659, 'learning_rate': 1.9207481375812334e-05, 'epoch': 0.12}


  6%|▌         | 1500/25236 [13:34<3:34:09,  1.85it/s]

{'loss': 0.789, 'grad_norm': 3.275932788848877, 'learning_rate': 1.8811222063718498e-05, 'epoch': 0.18}


  8%|▊         | 2000/25236 [18:06<3:25:18,  1.89it/s]

{'loss': 0.7358, 'grad_norm': 3.0901458263397217, 'learning_rate': 1.8414962751624666e-05, 'epoch': 0.24}


 10%|▉         | 2500/25236 [22:36<3:25:19,  1.85it/s]

{'loss': 0.7202, 'grad_norm': 3.7790260314941406, 'learning_rate': 1.801870343953083e-05, 'epoch': 0.3}


 12%|█▏        | 3000/25236 [27:04<3:14:50,  1.90it/s]

{'loss': 0.6646, 'grad_norm': 2.1665198802948, 'learning_rate': 1.7622444127436998e-05, 'epoch': 0.36}


 14%|█▍        | 3500/25236 [31:32<3:11:37,  1.89it/s]

{'loss': 0.6752, 'grad_norm': 2.3151416778564453, 'learning_rate': 1.7226184815343162e-05, 'epoch': 0.42}


 16%|█▌        | 4000/25236 [36:00<3:07:08,  1.89it/s]

{'loss': 0.6623, 'grad_norm': 4.289205074310303, 'learning_rate': 1.6829925503249326e-05, 'epoch': 0.48}


 18%|█▊        | 4500/25236 [40:27<3:01:44,  1.90it/s]

{'loss': 0.6326, 'grad_norm': 3.1733734607696533, 'learning_rate': 1.6433666191155494e-05, 'epoch': 0.53}


 20%|█▉        | 5000/25236 [44:51<2:53:46,  1.94it/s]

{'loss': 0.6217, 'grad_norm': 3.655308723449707, 'learning_rate': 1.6037406879061658e-05, 'epoch': 0.59}


 22%|██▏       | 5500/25236 [49:12<3:02:40,  1.80it/s]

{'loss': 0.6148, 'grad_norm': 2.426121234893799, 'learning_rate': 1.5641147566967826e-05, 'epoch': 0.65}


 24%|██▍       | 6000/25236 [53:43<2:51:29,  1.87it/s]

{'loss': 0.6033, 'grad_norm': 3.019343137741089, 'learning_rate': 1.524488825487399e-05, 'epoch': 0.71}


 26%|██▌       | 6500/25236 [58:15<2:46:54,  1.87it/s]

{'loss': 0.5964, 'grad_norm': 2.171902894973755, 'learning_rate': 1.4848628942780156e-05, 'epoch': 0.77}


 28%|██▊       | 7000/25236 [1:02:46<2:42:38,  1.87it/s]

{'loss': 0.5933, 'grad_norm': 3.705193281173706, 'learning_rate': 1.4452369630686322e-05, 'epoch': 0.83}


 30%|██▉       | 7500/25236 [1:07:16<2:37:07,  1.88it/s]

{'loss': 0.5997, 'grad_norm': 3.1082398891448975, 'learning_rate': 1.4056110318592488e-05, 'epoch': 0.89}


 32%|███▏      | 8000/25236 [1:11:46<2:34:09,  1.86it/s]

{'loss': 0.5687, 'grad_norm': 2.9610862731933594, 'learning_rate': 1.3659851006498654e-05, 'epoch': 0.95}


                                                        
 33%|███▎      | 8412/25236 [1:20:06<2:47:04,  1.68it/s]

{'eval_loss': 0.5195939540863037, 'eval_runtime': 278.079, 'eval_samples_per_second': 60.123, 'eval_steps_per_second': 7.516, 'epoch': 1.0}


 34%|███▎      | 8500/25236 [1:20:54<2:27:11,  1.90it/s]  

{'loss': 0.5843, 'grad_norm': 3.4731698036193848, 'learning_rate': 1.326359169440482e-05, 'epoch': 1.01}


 36%|███▌      | 9000/25236 [1:25:23<2:22:48,  1.89it/s]

{'loss': 0.5693, 'grad_norm': 2.3894593715667725, 'learning_rate': 1.2867332382310986e-05, 'epoch': 1.07}


 38%|███▊      | 9500/25236 [1:29:53<2:18:56,  1.89it/s]

{'loss': 0.5665, 'grad_norm': 3.145565986633301, 'learning_rate': 1.247107307021715e-05, 'epoch': 1.13}


 40%|███▉      | 10000/25236 [1:34:26<2:14:58,  1.88it/s]

{'loss': 0.5574, 'grad_norm': 2.6279871463775635, 'learning_rate': 1.2074813758123316e-05, 'epoch': 1.19}


 42%|████▏     | 10500/25236 [1:38:58<2:12:20,  1.86it/s]

{'loss': 0.5405, 'grad_norm': 1.9093999862670898, 'learning_rate': 1.1678554446029482e-05, 'epoch': 1.25}


 44%|████▎     | 11000/25236 [1:43:27<2:06:47,  1.87it/s]

{'loss': 0.5507, 'grad_norm': 2.3656275272369385, 'learning_rate': 1.1282295133935648e-05, 'epoch': 1.31}


 46%|████▌     | 11500/25236 [1:47:57<2:05:47,  1.82it/s]

{'loss': 0.5573, 'grad_norm': 2.009424924850464, 'learning_rate': 1.0886035821841814e-05, 'epoch': 1.37}


 48%|████▊     | 12000/25236 [1:52:36<1:59:20,  1.85it/s]

{'loss': 0.5447, 'grad_norm': 2.790330648422241, 'learning_rate': 1.048977650974798e-05, 'epoch': 1.43}


 50%|████▉     | 12500/25236 [1:57:10<1:56:34,  1.82it/s]

{'loss': 0.5304, 'grad_norm': 2.3634917736053467, 'learning_rate': 1.0093517197654146e-05, 'epoch': 1.49}


 52%|█████▏    | 13000/25236 [2:01:46<1:50:19,  1.85it/s]

{'loss': 0.5492, 'grad_norm': 1.8188008069992065, 'learning_rate': 9.697257885560312e-06, 'epoch': 1.55}


 53%|█████▎    | 13500/25236 [2:06:19<1:46:50,  1.83it/s]

{'loss': 0.5377, 'grad_norm': 3.021474838256836, 'learning_rate': 9.300998573466478e-06, 'epoch': 1.6}


 55%|█████▌    | 14000/25236 [2:10:53<1:44:38,  1.79it/s]

{'loss': 0.522, 'grad_norm': 2.7724623680114746, 'learning_rate': 8.904739261372642e-06, 'epoch': 1.66}


 57%|█████▋    | 14500/25236 [2:15:30<1:37:07,  1.84it/s]

{'loss': 0.5476, 'grad_norm': 2.124424695968628, 'learning_rate': 8.508479949278808e-06, 'epoch': 1.72}


 59%|█████▉    | 15000/25236 [2:20:06<1:33:47,  1.82it/s]

{'loss': 0.5324, 'grad_norm': 2.9832992553710938, 'learning_rate': 8.112220637184974e-06, 'epoch': 1.78}


 61%|██████▏   | 15500/25236 [2:24:41<1:30:13,  1.80it/s]

{'loss': 0.5287, 'grad_norm': 2.484278678894043, 'learning_rate': 7.71596132509114e-06, 'epoch': 1.84}


 63%|██████▎   | 16000/25236 [2:29:18<1:22:59,  1.85it/s]

{'loss': 0.5346, 'grad_norm': 2.693070650100708, 'learning_rate': 7.319702012997306e-06, 'epoch': 1.9}


 65%|██████▌   | 16500/25236 [2:33:54<1:20:01,  1.82it/s]

{'loss': 0.5224, 'grad_norm': 1.8972607851028442, 'learning_rate': 6.923442700903472e-06, 'epoch': 1.96}


                                                          
 67%|██████▋   | 16824/25236 [2:41:44<1:08:58,  2.03it/s]

{'eval_loss': 0.4774123430252075, 'eval_runtime': 278.0684, 'eval_samples_per_second': 60.125, 'eval_steps_per_second': 7.516, 'epoch': 2.0}


 67%|██████▋   | 17000/25236 [2:43:20<1:14:20,  1.85it/s]  

{'loss': 0.526, 'grad_norm': 3.3371341228485107, 'learning_rate': 6.527183388809638e-06, 'epoch': 2.02}


 69%|██████▉   | 17500/25236 [2:47:55<1:23:18,  1.55it/s]

{'loss': 0.5141, 'grad_norm': 1.9682505130767822, 'learning_rate': 6.130924076715803e-06, 'epoch': 2.08}


 71%|███████▏  | 18000/25236 [3:54:38<1:05:04,  1.85it/s]   

{'loss': 0.5141, 'grad_norm': 1.9189631938934326, 'learning_rate': 5.734664764621969e-06, 'epoch': 2.14}


 73%|███████▎  | 18500/25236 [3:59:09<59:56,  1.87it/s]  

{'loss': 0.5143, 'grad_norm': 2.137493371963501, 'learning_rate': 5.338405452528135e-06, 'epoch': 2.2}


 74%|███████▎  | 18576/25236 [3:59:52<59:00,  1.88it/s]  '(MaxRetryError("HTTPSConnectionPool(host='hf-hub-lfs-us-east-1.s3-accelerate.amazonaws.com', port=443): Max retries exceeded with url: /repos/68/06/6806daa1282d81c0310b0d7762f1b9f17a88f4db09f73b467e997d4dfb8bcbc3/6ac8e5921524ba0b7ff5537890a2cb67b63c3d9248e8044a430615bf1f54dd48?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA2JU7TKAQLC2QXPN7%2F20241127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241127T161017Z&X-Amz-Expires=86400&X-Amz-Signature=f5e42a05914cf5ef1facb7a43fe5d59a9e70586bcb081f65d1efc5f3923931a5&X-Amz-SignedHeaders=host&partNumber=1&uploadId=IxVQYRzAzQRbKI9fvy85ANlUMnJmWStto2IR_Q78hQJAF8WNmVdG49o0a21mmt2hrWFEaI.LT5cLhcBCC_iwFo32I8WZDle8KLY7CoJem5.oq.pgw2ov5SlCtXQVzPhR&x-id=UploadPart (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')))"), '(Request ID: b898e3b7-ff85-4ab6-8fab-15e2d8776ac0)')' thrown while requesting PUT https://hf-hu

{'loss': 0.5117, 'grad_norm': 2.0127975940704346, 'learning_rate': 4.942146140434301e-06, 'epoch': 2.26}


 77%|███████▋  | 19500/25236 [4:08:05<50:15,  1.90it/s]  

{'loss': 0.5104, 'grad_norm': 2.227771520614624, 'learning_rate': 4.545886828340466e-06, 'epoch': 2.32}


 79%|███████▉  | 20000/25236 [4:12:32<46:29,  1.88it/s]  

{'loss': 0.5119, 'grad_norm': 1.7230807542800903, 'learning_rate': 4.149627516246632e-06, 'epoch': 2.38}


 81%|████████  | 20500/25236 [4:17:01<41:39,  1.89it/s]  

{'loss': 0.515, 'grad_norm': 2.8102188110351562, 'learning_rate': 3.753368204152798e-06, 'epoch': 2.44}


 83%|████████▎ | 21000/25236 [4:21:29<37:09,  1.90it/s]  

{'loss': 0.5152, 'grad_norm': 2.353322744369507, 'learning_rate': 3.3571088920589633e-06, 'epoch': 2.5}


 85%|████████▌ | 21500/25236 [4:25:55<32:44,  1.90it/s]  

{'loss': 0.5006, 'grad_norm': 2.4536876678466797, 'learning_rate': 2.9608495799651293e-06, 'epoch': 2.56}


 87%|████████▋ | 22000/25236 [4:30:21<27:54,  1.93it/s]  

{'loss': 0.5171, 'grad_norm': 2.692809581756592, 'learning_rate': 2.5645902678712953e-06, 'epoch': 2.62}


 89%|████████▉ | 22500/25236 [4:34:48<23:52,  1.91it/s]  

{'loss': 0.5174, 'grad_norm': 2.5782158374786377, 'learning_rate': 2.168330955777461e-06, 'epoch': 2.67}


 91%|█████████ | 23000/25236 [4:39:13<19:28,  1.91it/s]

{'loss': 0.5054, 'grad_norm': 3.181828022003174, 'learning_rate': 1.7720716436836266e-06, 'epoch': 2.73}


 93%|█████████▎| 23500/25236 [4:43:38<14:53,  1.94it/s]

{'loss': 0.509, 'grad_norm': 2.4656076431274414, 'learning_rate': 1.3758123315897926e-06, 'epoch': 2.79}


 95%|█████████▌| 24000/25236 [4:48:05<11:01,  1.87it/s]

{'loss': 0.5196, 'grad_norm': 2.672015428543091, 'learning_rate': 9.795530194959582e-07, 'epoch': 2.85}


 97%|█████████▋| 24500/25236 [4:52:36<06:29,  1.89it/s]

{'loss': 0.5075, 'grad_norm': 1.7535483837127686, 'learning_rate': 5.83293707402124e-07, 'epoch': 2.91}


 99%|█████████▉| 25000/25236 [4:57:06<02:05,  1.87it/s]

{'loss': 0.5201, 'grad_norm': 2.7316629886627197, 'learning_rate': 1.8703439530828975e-07, 'epoch': 2.97}


                                                       
100%|██████████| 25236/25236 [5:03:49<00:00,  1.38it/s]

{'eval_loss': 0.46561333537101746, 'eval_runtime': 272.7404, 'eval_samples_per_second': 61.3, 'eval_steps_per_second': 7.663, 'epoch': 3.0}
{'train_runtime': 18229.2929, 'train_samples_per_second': 11.074, 'train_steps_per_second': 1.384, 'train_loss': 0.5792096785422758, 'epoch': 3.0}





TrainOutput(global_step=25236, training_loss=0.5792096785422758, metrics={'train_runtime': 18229.2929, 'train_samples_per_second': 11.074, 'train_steps_per_second': 1.384, 'total_flos': 6593692852813824.0, 'train_loss': 0.5792096785422758, 'epoch': 3.0})

In [21]:
prompt = "Somatic hypermutation allows the immune system to"

from transformers import pipeline

generator = pipeline("text-generation", model="my_awesome_eli5_clm-model")
generator(prompt)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'generated_text': 'Somatic hypermutation allows the immune system to s i o n. t y p e.                              '}]