# Baseline model without RAG

This is a baseline model for the QA task without the RAG pipline.

In order to compare, we choose the same backbone model as the one in the RAG pipeline: the `meta-llama/Llama-3.2-3B-Instruct` model. We also adopt the same data type (fp16) and the same config for setting up the tokenizer. We use the same prompt format as the one in the RAG pipeline.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from huggingface_hub import login


model_name = "meta-llama/Llama-3.2-3B-Instruct"

login(token = os.getenv('LANGCHAIN_API_KEY'))

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

generation_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

Device set to use cuda:0


In [None]:
# Step 3: load qa annotation test set
import pandas as pd
# qa_df = pd.read_csv("../data/annotated/QA_pairs_1.csv")
qa_df = pd.read_csv("./data/test/test_questions.csv")

# doc_ids = qa_df["Doc_id"].tolist()
questions = qa_df["Question"].tolist()
# answers = qa_df["Reference_Answers"].tolist()

# # random sample 10 qa pairs
# import random
# sample_size = 10
# random.seed(747)
# sample_indices = random.sample(range(len(questions)), sample_size)
# sample_doc_ids = [doc_ids[i] for i in sample_indices]
# sample_questions = [questions[i] for i in sample_indices]
# sample_answers = [answers[i] for i in sample_indices]

In [None]:
template = """
You are an expert assistant answering factual questions about various aspects of Pittsburgh or Carnegie Mellon University (CMU), including history, policy, culture, events, and more.
If you do not know the answer, just say "I don't know."

Important Instructions:
- Answer concisely without repeating the question.
- Do **not** use complete sentences. Provide only the word, name, date, or phrase that directly answers the question. For example, given the question "When was Carnegie Mellon University founded?", you should only answer "1900".

Examples:
Question: Who is Pittsburgh named after?
Answer: William Pitt
Question: What famous machine learning venue had its first conference in Pittsburgh in 1980?
Answer: ICML
Question: What musical artist is performing at PPG Arena on October 13?
Answer: Billie Eilish

Question: {question} \n\n
Answer:
"""

In [None]:
# use the template to generate the answers
from tqdm import tqdm
generated_answers = []
for question in tqdm(questions):
    full_prompt = template.format(question=question)
    messages = [
        {"role": "user", "content": full_prompt},
        ]
    output = generation_pipe(messages, max_new_tokens=50)
    generated_answers.append(output[0]["generated_text"][1]['content'])

  2%|▏         | 10/574 [00:03<02:11,  4.29it/s]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
100%|██████████| 574/574 [02:04<00:00,  4.60it/s]


In [None]:
# write all columns to a csv file
# results_df = pd.DataFrame({
#         "Doc_id": doc_ids,
#         "Question": questions,
#         "Reference_Answers": answers,
#         "Generated_Answer": generated_answers,
#     })

results_df = pd.DataFrame({
        "Question": questions,
        "Generated_Answer": generated_answers,
    })

# save the results to a csv file
results_df.to_csv("./output/closebook_baseline.csv", index=False)

In [None]:
results_df

Unnamed: 0,Question,Generated_Answer
0,"What bank, which is the 5th largest in the US,...",PNC Bank
1,How many bridges does Pittsburgh have?,403
2,Who named the city of Pittsburgh?,General Robert Moore
3,At what park do the three rivers converge in P...,Point State Park
4,How many neighborhoods does Pittsburgh have?,19
...,...,...
569,What is the primary focus of the event at the ...,Pittsburgh JazzLive
570,Where and when is the Pittsburgh Veg Fair held...,Pennsylvania State Farm Show Complex
571,How can restaurants get involved with Pittsbur...,Register online through VisitPittsburgh
572,What are the benefits of sponsoring the Pittsb...,I don't know
