# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [1]:
%pip install --quiet https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu122/llama_cpp_python-0.3.4-cp311-cp311-linux_x86_64.whl

In [2]:
%pip install PyPDF2



In [3]:
%pip install git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark

Collecting git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark
  Cloning https://github.com/mozilla-ai/structured-qa.git (to revision 5-add-benchmark) to /tmp/pip-req-build-g4ugf7tj
  Running command git clone --filter=blob:none --quiet https://github.com/mozilla-ai/structured-qa.git /tmp/pip-req-build-g4ugf7tj
  Running command git checkout -b 5-add-benchmark --track origin/5-add-benchmark
  Switched to a new branch '5-add-benchmark'
  Branch '5-add-benchmark' set up to track remote branch '5-add-benchmark' from 'origin'.
  Resolved https://github.com/mozilla-ai/structured-qa.git to commit c5ee8e63ab951b740147be2d69c2f00549043734
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [4]:
!wget https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv

--2025-02-03 14:27:25--  https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21441 (21K) [text/plain]
Saving to: ‘structured_qa.csv.2’


2025-02-03 14:27:25 (14.8 MB/s) - ‘structured_qa.csv.2’ saved [21441/21441]



# Setup

In [5]:
import os

os.environ["LOGURU_LEVEL"] = "INFO"

In [6]:
from loguru import logger

In [7]:
import PyPDF2


def load_pdf(pdf_file: str) -> str | None:
    try:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        return "\n".join(page.extract_text() for page in pdf_reader.pages)
    except Exception as e:
        logger.exception(e)
        return None

## Function to Process a single Document

In [8]:
ANSWER_WITH_TYPE_PROMPT = """
You are a rigorous assistant answering questions.
You must only answer based on the current information available which is:

```
{CURRENT_INFO}
```

If the current information available not enough to answer the question,
you must return "I need more info" srting and nothing else:

If the current information is enough to answer, you must return one of the following formats:
- YES/NO (for boolean questions)
- Number (for numeric questions)
- Single letter (for multiple-choice questions)
"""


def process_document(
    document_file,
    document_data,
    model,
):
    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in document_data.iterrows():
        question = row["question"]
        logger.info(f"Question: {question}")
        messages = [
            {
                "role": "system",
                "content": ANSWER_WITH_TYPE_PROMPT.format(
                    CURRENT_INFO=load_pdf(document_file)
                ),
            },
            {"role": "user", "content": question},
        ]
        try:
            answer = model.get_response(messages)
        except Exception as e:
            answer = "Out of context"
        logger.info(f"Answer: {answer}")
        answers[index] = answer
        sections[index] = None

    return answers, sections

## Load Model

In [9]:
from structured_qa.model_loaders import load_llama_cpp_model

In [10]:
model = load_llama_cpp_model(
    "bartowski/Qwen2.5-7B-Instruct-GGUF/Qwen2.5-7B-Instruct-Q8_0.gguf"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# Run Benchmark

In [11]:
from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd

logger.info("Loading input data")
data = pd.read_csv("structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for document_link, document_data in data.groupby("document"):
    logger.info(f"Downloading document {document_link}")
    downloaded_document = Path(f"{Path(document_link).name}.pdf")
    if not Path(downloaded_document).exists():
        urlretrieve(document_link, downloaded_document)
        logger.info(f"Downloaded {document_link} to {downloaded_document}")
    else:
        logger.info(f"File {downloaded_document} already exists")

    answers, sections = process_document(downloaded_document, document_data, model)

    for index in document_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-02-03 14:27:35.309[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m6[0m - [1mLoading input data[0m
[32m2025-02-03 14:27:35.319[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m12[0m - [1mDownloading document https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf[0m
[32m2025-02-03 14:27:35.322[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m18[0m - [1mFile HAI_AI-Index-Report-2024.pdf.pdf already exists[0m
[32m2025-02-03 14:27:35.325[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m27[0m - [1mPredicting[0m
[32m2025-02-03 14:27:35.328[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m32[0m - [1mQuestion: which type of risk was identified as the leading concern globally? -A: Fairness risks. -B: Privacy and data governance risks. -C: Risks related to generative AI deployment.[0m
[32m2025-02-03 14:28:23.259[0m | 

In [14]:
results = pd.read_csv("results.csv")
for index, result in results.iterrows():
    if result["pred_answer"].startswith(
        (f"-{result['answer']}", f"{result['answer']}")
    ):
        results.loc[index, "pred_answer"] = result["answer"]
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,section,question,answer,pred_answer,pred_section
10,10,https://arxiv.org/pdf/1706.03762,5.4 Regularization,What was the dropout rate used for the base mo...,0.1,YES,
17,17,https://arxiv.org/pdf/2106.09685.pdf,4 OUR METHOD,Does LoRA work with any neural network contain...,YES,NO,
22,22,https://authorsalliance.org/wp-content/uploads...,HOW DO YOU CHOOSE AN OPEN ACCESS PUBLISHER?,how many peer-reviewed open access journals ar...,A,B,
24,24,https://authorsalliance.org/wp-content/uploads...,OVERCOMING RESERVATIONS ABOUT OPEN ACCESS,Are publication fees required for all open acc...,NO,I NEED MORE INFO,
27,27,https://arxiv.org/pdf/2201.11903,3 Arithmetic Reasoning,Is Arithmetic reasoning is a task that languag...,NO,OUT OF CONTEXT,
28,28,https://arxiv.org/pdf/2201.11903,3.1 Experimental Setup,How many large language models were evaluated?,5,OUT OF CONTEXT,
29,29,https://arxiv.org/pdf/2201.11903,3.1 Experimental Setup,How many benchmarks were used to evaluate arit...,5,OUT OF CONTEXT,
30,30,https://arxiv.org/pdf/2201.11903,5 Symbolic Reasoning,Is symbolic reasoning usually simple for human...,YES,OUT OF CONTEXT,
31,31,https://arxiv.org/pdf/2201.11903,5 Symbolic Reasoning,How many words have the example names that the...,B,OUT OF CONTEXT,
32,32,https://arxiv.org/pdf/2201.11903,5 Symbolic Reasoning,Which symbolic reasoning task is used as an ou...,A,OUT OF CONTEXT,


In [15]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.5339805825242718