Story : Plots , Metrics for LLM support [2] #138

SoyGema · 2023-08-09T15:14:48Z

Submission Type

Discussion

Context

Offer ML support for another use cases beyond Translation, that could mostly imply text generation (Q/A , summarization, etc)
Separated from #137 in other to focus on solving plots and one use case and offer possible support for others.

Impact

Offer support

Issue creator Goal

Offer support

Leaving a placeholder here to possible questions.
Will prob submit some q and a coming from discussion in the meantime
Thanks!

Tasks

Give feedback

Define use case Q&A , metrics and plots for each one
Define use case Summarization , metrics and plots for each one
Options

SoyGema · 2023-08-24T16:52:37Z

Q&A

Tasks variants

Extractive QA: The model extracts the answer from a context. The context here could be a provided text, a table or even HTML! This is usually solved with Falcon, RoBERTa and BERT-like models. This is the most common to apply in some business cases.
Open Generative QA: The model generates free text directly based on the context. You can learn more about the Text Generation task in its page.
Closed Generative QA: In this case, no context is provided. The answer is completely generated by a model.

Code

Preloads a series of models (DistilBERT family) . Uses Trainer class from HF Transformers library so DVCLive HF Callback can be easily implemented here . Last DVCLive HF taken from #649
Load dataset ,tokenizer and preprocessing function omited for simplicity but can be found here

from dvclive.huggingface import DVCLiveCallback
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
...
...
training_args = TrainingArguments(
    output_dir="output",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    logging_strategy="epoch",         
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    save_strategy="epoch",     
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    post_process_function=post_processing_function , 
    callbacks=[DVCLiveCallback(save_dvc_exp=True, log_model=True)],   #DVCLive 
)

trainer.train()

Metrics

During training, the model computes the loss.
As evaluation metrics for the fine-tuned model , we use SQUAD. We define the metric, take the theoretical answer and the predicted answer, and compute the metric. This evaluation metric is computed after fine-tuning.

It returns a dictionary of the form that can be logged into dvc API

import evaluate
metric = evaluate.load("squad")


theoretical_answers = [
    {"id": ex["id"], "answers": ex["answers"]} for ex in small_eval_set
]

metrics_load_dvc = metric.compute(predictions=predicted_answers, references=theoretical_answers)

metrics_load_dvc returns

{'exact_match': 83.0, 'f1': 88.25}

As I understand, once the model is finetuned we should see these values in each table row (in DVC VSCode extension) per experiment. Important note: We can use the loss metric as a live metric during finetunning, and match and f1 might appear in the table once the fine-tuned process is finished.

with Live() as live:
    live.log_metric(("exact_match", metrics_load_dvc.get('exact_match'))
    live.log_metric("f1", metrics_load_dvc.get('f1'))

SoyGema · 2023-08-30T14:46:59Z

Summarization

Tasks variants

Summarization creates a shorter version of a document or an article that captures all the important information. Along with translation, it is another example of a task that can be formulated as a sequence-to-sequence task. Summarization can be:

Extractive: extract the most relevant information from a document.
Abstractive: generate new text that captures the most relevant information.

I saw that @dberenbaum used this in his example, so no much to add. Maybe he could use some of this text above to complement his README.MD + add html studio link to give it more reach in his repo ? If Dave read this at some point, please also consider changing the repo name to something more task oriented + frameworks, so it can be indexed as such when users search. - summarization_hf_dvc instead of seq2seqhf -
Thanks for making it anyway!

Code

Works with certain model architectures (BART, Pegasus, T5 family) . Uses Trainer class from HF Transformers library so DVCLive HF Callback can be easily implemented here . Last DVCLive HF taken from #649
Load dataset ,tokenizer and preprocessing function omited for simplicity but can be found here

from dvclive.huggingface import DVCLiveCallback
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
...
...
training_args = Seq2SeqTrainingArguments(
    output_dir="my_awesome_billsum_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=4,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_billsum["train"],
    eval_dataset=tokenized_billsum["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Metrics

During training, the model computes the loss.
As evaluation metrics for the fine-tuned model , we use ROUGE We define the metric, take the theoretical answer and the predicted answer, and compute the metric. This evaluation metric is computed after fine-tuning.

It returns a dictionary of the form that can be logged into dvc API

import evaluate
metric = evaluate.load("rouge")

As I understand, once the model is finetuned we should see these values in each table row (in DVC VSCode extension) per experiment.

with Live() as live:
    live.log_metric(("rouge", metric)

SoyGema closed this as completed Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Story : Plots , Metrics for LLM support [2] #138

Story : Plots , Metrics for LLM support [2] #138

SoyGema commented Aug 9, 2023 •

edited

Loading

Tasks

SoyGema commented Aug 24, 2023 •

edited

Loading

SoyGema commented Aug 30, 2023

Story : Plots , Metrics for LLM support [2] #138

Story : Plots , Metrics for LLM support [2] #138

Comments

SoyGema commented Aug 9, 2023 • edited Loading

Submission Type

Context

Impact

Issue creator Goal

Tasks

SoyGema commented Aug 24, 2023 • edited Loading

Q&A

Tasks variants

Code

Metrics

SoyGema commented Aug 30, 2023

Summarization

Tasks variants

Code

Metrics

SoyGema commented Aug 9, 2023 •

edited

Loading

SoyGema commented Aug 24, 2023 •

edited

Loading