# QAMaster: Fine-tuned Q/A Chat Box
- Summer 2024

This project creates a Q/A chat box using multiple pre-trained models. It fine-tunes these models on a Q/A dataset and evaluates them to choose the best one based on F1-score and Exact Match.

hf access token:

In [3]:
token = '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'

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, AutoConfig
from datasets import load_dataset, Dataset
import numpy as np
from sklearn.model_selection import train_test_split

## Intro
- Loading Pre-trained Models
- Loading Squad dataset

In [5]:
models_to_finetune = [
    "tuner007/pegasus_paraphrase",  # Scientific Research
    "gpt2",                 # gpt2
]

tokenizer_models = {}
for model_name in models_to_finetune:
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=token)
    config = AutoConfig.from_pretrained(model_name, use_auth_token=token)
    model = AutoModelForCausalLM.from_pretrained(model_name, config=config, use_auth_token=token)
    
    tokenizer_models[model_name] = (tokenizer, model)





pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForCausalLM were not initialized from the model checkpoint at tuner007/pegasus_paraphrase and are newly initialized: ['model.decoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [24]:
def load_qa_dataset():
    dataset = load_dataset("squad")
    return dataset
dataset = load_qa_dataset()

## PreProcessing
- Split QA Dataset into Train, Validation, and Test Sets

In [25]:
train_valid_dataset = dataset["train"]
test_dataset = dataset["validation"]

train_dataset, eval_dataset = train_test_split(train_valid_dataset, test_size=0.2, random_state=42)

## Fine-tuning Function
- Fine-tuning function defined

In [26]:
def fine_tune_model(model_name, tokenizer, model, train_dataset, eval_dataset, training_args):
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
    )

    trainer.train()

    eval_results = trainer.evaluate()

    return eval_results

## Fine-tune and Evaluation
- Training arguments
- Results
- Best Model

In [27]:
training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=2,
    save_steps=10_000,
    save_total_limit=2,
    prediction_loss_only=True,
)

In [14]:
results = {}

for model_name, (tokenizer, model) in tokenizer_models.items():
    print(f"Fine-tuning {model_name}...")
    eval_results = fine_tune_model(model_name, tokenizer, model, train_dataset, eval_dataset, training_args)
    results[model_name] = eval_results

    print(f"Evaluation results for {model_name}:")
    print(f"Exact Match: {eval_results['exact_match']:.2f}")
    print(f"F1 Score: {eval_results['f1']:.2f}")
    print()

Fine-tuning tuner007/pegasus_paraphrase...
Error during fine-tuning tuner007/pegasus_paraphrase: 2
Evaluation results for tuner007/pegasus_paraphrase:


KeyError: 'exact_match'

In [15]:
best_model = max(results, key=lambda x: results[x]['f1'])
print(f"Best model based on F1 score: {best_model}")
print(f"Evaluation results - Exact Match: {results[best_model]['exact_match']:.2f}, F1 Score: {results[best_model]['f1']:.2f}")

KeyError: 'f1'

## Test

In [34]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 87599
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 10570
    })
})