Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Story : Plots , Metrics for LLM support [2] #138

Closed
2 tasks done
SoyGema opened this issue Aug 9, 2023 · 2 comments
Closed
2 tasks done

Story : Plots , Metrics for LLM support [2] #138

SoyGema opened this issue Aug 9, 2023 · 2 comments

Comments

@SoyGema
Copy link

SoyGema commented Aug 9, 2023

Submission Type

  • Discussion

Context

Offer ML support for another use cases beyond Translation, that could mostly imply text generation (Q/A , summarization, etc)
Separated from #137 in other to focus on solving plots and one use case and offer possible support for others.

Impact

  • Offer support

Issue creator Goal

  • Offer support

Leaving a placeholder here to possible questions.
Will prob submit some q and a coming from discussion in the meantime
Thanks!

Tasks

@SoyGema
Copy link
Author

SoyGema commented Aug 24, 2023

Q&A

Tasks variants

  • Extractive QA: The model extracts the answer from a context. The context here could be a provided text, a table or even HTML! This is usually solved with Falcon, RoBERTa and BERT-like models. This is the most common to apply in some business cases.

  • Open Generative QA: The model generates free text directly based on the context. You can learn more about the Text Generation task in its page.

  • Closed Generative QA: In this case, no context is provided. The answer is completely generated by a model.

Code

Preloads a series of models (DistilBERT family) . Uses Trainer class from HF Transformers library so DVCLive HF Callback can be easily implemented here . Last DVCLive HF taken from #649
Load dataset ,tokenizer and preprocessing function omited for simplicity but can be found here

from dvclive.huggingface import DVCLiveCallback
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
...
...
training_args = TrainingArguments(
    output_dir="output",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    logging_strategy="epoch",         
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    save_strategy="epoch",     
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    post_process_function=post_processing_function , 
    callbacks=[DVCLiveCallback(save_dvc_exp=True, log_model=True)],   #DVCLive 
)

trainer.train()

Metrics

During training, the model computes the loss.
As evaluation metrics for the fine-tuned model , we use SQUAD. We define the metric, take the theoretical answer and the predicted answer, and compute the metric. This evaluation metric is computed after fine-tuning.

It returns a dictionary of the form that can be logged into dvc API

import evaluate
metric = evaluate.load("squad")


theoretical_answers = [
    {"id": ex["id"], "answers": ex["answers"]} for ex in small_eval_set
]

metrics_load_dvc = metric.compute(predictions=predicted_answers, references=theoretical_answers)

metrics_load_dvc returns

{'exact_match': 83.0, 'f1': 88.25}

As I understand, once the model is finetuned we should see these values in each table row (in DVC VSCode extension) per experiment. Important note: We can use the loss metric as a live metric during finetunning, and match and f1 might appear in the table once the fine-tuned process is finished.

with Live() as live:
    live.log_metric(("exact_match", metrics_load_dvc.get('exact_match'))
    live.log_metric("f1", metrics_load_dvc.get('f1'))

@SoyGema
Copy link
Author

SoyGema commented Aug 30, 2023

Summarization

Tasks variants

Summarization creates a shorter version of a document or an article that captures all the important information. Along with translation, it is another example of a task that can be formulated as a sequence-to-sequence task. Summarization can be:

  • Extractive: extract the most relevant information from a document.
  • Abstractive: generate new text that captures the most relevant information.

I saw that @dberenbaum used this in his example, so no much to add. Maybe he could use some of this text above to complement his README.MD + add html studio link to give it more reach in his repo ? If Dave read this at some point, please also consider changing the repo name to something more task oriented + frameworks, so it can be indexed as such when users search. - summarization_hf_dvc instead of seq2seqhf -
Thanks for making it anyway!

Code

Works with certain model architectures (BART, Pegasus, T5 family) . Uses Trainer class from HF Transformers library so DVCLive HF Callback can be easily implemented here . Last DVCLive HF taken from #649
Load dataset ,tokenizer and preprocessing function omited for simplicity but can be found here

from dvclive.huggingface import DVCLiveCallback
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
...
...
training_args = Seq2SeqTrainingArguments(
    output_dir="my_awesome_billsum_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=4,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_billsum["train"],
    eval_dataset=tokenized_billsum["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Metrics

During training, the model computes the loss.
As evaluation metrics for the fine-tuned model , we use ROUGE We define the metric, take the theoretical answer and the predicted answer, and compute the metric. This evaluation metric is computed after fine-tuning.

It returns a dictionary of the form that can be logged into dvc API

import evaluate
metric = evaluate.load("rouge")

As I understand, once the model is finetuned we should see these values in each table row (in DVC VSCode extension) per experiment.

with Live() as live:
    live.log_metric(("rouge", metric)

@SoyGema SoyGema closed this as completed Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant