# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [1]:
!git clone --single-branch --branch 5-add-benchmark https://github.com/mozilla-ai/structured-qa

fatal: destination path 'structured-qa' already exists and is not an empty directory.


In [2]:
%pip install ./structured-qa

Processing ./structured-qa
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: structured-qa
  Building wheel for structured-qa (pyproject.toml) ... [?25l[?25hdone
  Created wheel for structured-qa: filename=structured_qa-0.3.3.dev93+g6906991-py3-none-any.whl size=13072 sha256=5d3ec90fa03ce1a4f1fb52f1d1c79ef1dcf475d443839e22372e2d9779e34f54
  Stored in directory: /root/.cache/pip/wheels/b8/d1/8b/1585580e7787d68790745653775eb485d52a0d5386b616c827
Successfully built structured-qa
Installing collected packages: structured-qa
  Attempting uninstall: structured-qa
    Found existing installation: structured-qa 0.3.3.dev93+g6906991
    Uninstalling structured-qa-0.3.3.dev93+g6906991:
      Successfully uninstalled structured-qa-0.3.3.dev93+g6906991
Successfully installed structured-qa-0.3.3.dev93+g6906991


In [3]:
%pip install --quiet https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu122/llama_cpp_python-0.3.4-cp311-cp311-linux_x86_64.whl

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m445.2/445.2 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h

# Setup

In [4]:
import os

os.environ["LOGURU_LEVEL"] = "INFO"

In [5]:
from loguru import logger

## Function to Process all questions for a single Section

In [6]:
ANSWER_WITH_TYPE_PROMPT = """
You are a rigorous assistant answering questions.
You only answer based on the current information available.
The current information available is:

```
{CURRENT_INFO}
```

The answer must be ONLY one of the following strings and nothing else:
- YES/NO (for boolean questions)
Is the model an LLM?
YES
- Number (for numeric questions)
How many layers does the model have?
12
- Single letter (for multiple-choice questions)
What is the activation function used in the model? -A: ReLU -B: Sigmoid -C: Tanh
C
"""


def process_section_questions(
    section_file,
    section_data,
    model,
):
    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in section_data.iterrows():
        question = row["question"]
        logger.info(f"Question: {question}")
        messages = [
            {
                "role": "system",
                "content": ANSWER_WITH_TYPE_PROMPT.format(
                    CURRENT_INFO=section_file.read_text()
                ),
            },
            {"role": "user", "content": question},
        ]
        response = model.get_response(messages)
        logger.info(f"Answer: {response}")
        answers[index] = response
        sections[index] = None
    return answers, sections

## Load Model

In [7]:
%pip install --no-cache-dir --upgrade unsloth
%pip uninstall unsloth unsloth_zoo -y
%pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
%pip install --upgrade --no-cache-dir "git+https://github.com/unslothai/unsloth-zoo.git"

Found existing installation: unsloth 2025.1.8
Uninstalling unsloth-2025.1.8:
  Successfully uninstalled unsloth-2025.1.8
Found existing installation: unsloth_zoo 2025.1.5
Uninstalling unsloth_zoo-2025.1.5:
  Successfully uninstalled unsloth_zoo-2025.1.5
Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-_6n90mml/unsloth_f9c5530fd943413db7ff81d7b8e72107
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-_6n90mml/unsloth_f9c5530fd943413db7ff81d7b8e72107
  Resolved https://github.com/unslothai/unsloth.git to commit 038e6d4c8d40207a87297ab3aaf787c19b1006d1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.1.4 (from unsloth@ git

In [8]:
from structured_qa.model_loaders import load_unsloth_model

In [9]:
model = load_unsloth_model(
    "unsloth/DeepSeek-R1-Distill-Qwen-7B", "chatml"
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.8: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unsloth: Will map <|im_end|> to EOS = <｜end▁of▁sentence｜>.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.


# Run Benchmark

In [None]:
from pathlib import Path

import pandas as pd


logger.info("Loading input data")
data = pd.read_csv("structured-qa/benchmark/structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for section_name, section_data in data.groupby("section"):
    section_file = Path(f"structured-qa/benchmark/perfect_context/{section_name}.txt")

    answers, sections = process_section_questions(section_file, section_data, model)

    for index in section_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-01-31 12:42:34.602[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m6[0m - [1mLoading input data[0m
[32m2025-01-31 12:42:34.650[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m28[0m - [1mPredicting[0m
[32m2025-01-31 12:42:34.652[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m33[0m - [1mQuestion: In billions, how many trainable parameters does GPT-3 have?[0m
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


# Results

In [None]:
results = pd.read_csv("results.csv")
for index, result in results.iterrows():
    if result["pred_answer"].startswith(
        (f"-{result['answer']}", f"{result['answer']}")
    ):
        results.loc[index, "pred_answer"] = result["answer"]
results.loc[results["answer"] != results["pred_answer"]]

In [None]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy