# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## GPU Check

First, you'll need to enable GPUs for the notebook:

- Navigate to `Edit`→`Notebook Settings`
- Select T4 GPU from the Hardware Accelerator section
- Click `Save` and accept.

Next, we'll confirm that we can connect to the GPU:

In [6]:
import torch

if not torch.cuda.is_available():
    raise RuntimeError("GPU not available")
else:
    print("GPU is available!")

GPU is available!


## Installing dependencies

In [None]:
%pip install --quiet PyPDF2 ragatouille structured-qa

In [None]:
!wget https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/main/benchmark/structured_qa.csv

# Setup

In [10]:
import os
import google.generativeai as genai
from google.colab.userdata import get, SecretNotFoundError

try:
    genai.configure(api_key=get("GOOGLE_API_KEY"))
except SecretNotFoundError as e:
    raise RuntimeError("Please set the GOOGLE_API_KEY secret to your API key") from e
os.environ["LOGURU_LEVEL"] = "INFO"

In [11]:
from loguru import logger

In [12]:
import PyPDF2


def load_pdf(pdf_file: str) -> str | None:
    try:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        return "\n".join(page.extract_text() for page in pdf_reader.pages)
    except Exception as e:
        logger.exception(e)
        return None

## Function to Process all questions for a single Document

In [13]:
import time

from ragatouille import RAGPretrainedModel
from ragatouille.data import CorpusProcessor


def process_document(
    document_file,
    document_data,
    model,
):
    logger.info("Setting up RAG")
    RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
    corpus_processor = CorpusProcessor()
    documents = corpus_processor.process_corpus([load_pdf(document_file)])
    RAG.encode([x["content"] for x in documents])

    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in document_data.iterrows():
        if model.n > 0 and model.n % 9 == 0:
            logger.info("Waiting for 60 seconds")
            time.sleep(60)
        question = row["question"]
        question_part, *options = question.split("?")

        logger.info(f"Question: {question}")
        results = RAG.search_encoded_docs(query=question_part, k=3)
        current_info = "\n".join(result["content"] for result in results)
        logger.info(current_info[:100])

        answer = model.model.generate_content(
            [f"This is the document: {current_info}", question]
        )
        logger.info(answer.text)
        answers[index] = answer.text.strip()
        sections[index] = None
        model.n += 1

    return answers, sections

## Load Model

In [14]:
from structured_qa.model_loaders import load_gemini_model

In [15]:
SYSTEM_PROMPT = """
You are a rigorous assistant answering questions.
You must only answer based on the current information available which is:

```
{CURRENT_INFO}
```

If the current information available not enough to answer the question,
you must return "I need more info" srting and nothing else:

If the current information is enough to answer, you must return one of the following formats:
- YES/NO (for boolean questions)
- Number (for numeric questions)
- Single letter (for multiple-choice questions)
"""

In [16]:
model = load_gemini_model("gemini-2.0-flash-exp", system_prompt=SYSTEM_PROMPT)
model.n = 0

# Run Benchmark

In [None]:
from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd

logger.info("Loading input data")
data = pd.read_csv("structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)
for document_link, document_data in data.groupby("document"):
    logger.info(f"Downloading document {document_link}")
    downloaded_document = Path(f"{Path(document_link).name}.pdf")
    if not Path(downloaded_document).exists():
        urlretrieve(document_link, downloaded_document)
        logger.info(f"Downloaded {document_link} to {downloaded_document}")
    else:
        logger.info(f"File {downloaded_document} already exists")

    answers, sections = process_document(downloaded_document, document_data, model)

    for index in document_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

In [18]:
results = pd.read_csv("results.csv")
for index, result in results.iterrows():
    results.loc[index, "pred_answer"] = result["pred_answer"].strip()
    if result["pred_answer"].startswith(
        (f"-{result['answer']}", f"{result['answer']}")
    ):
        results.loc[index, "pred_answer"] = result["answer"]
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,type,section,question,answer,pred_answer,pred_section
10,10,https://arxiv.org/pdf/1706.03762,Scientific Paper,5.4 Regularization,What was the dropout rate used for the base mo...,0.1,0. 1,
26,26,https://authorsalliance.org/wp-content/uploads...,Techincal Documentation,CHAPTER 5: WHERE DO YOU WANT TO MAKE YOUR WORK...,Are Gold Open Access and Green Open Access mut...,NO,YES,
28,28,https://arxiv.org/pdf/2201.11903,Scientific Report,3.1 Experimental Setup,How many large language models were evaluated?,5,FIVE,
29,29,https://arxiv.org/pdf/2201.11903,Scientific Report,3.1 Experimental Setup,How many benchmarks were used to evaluate arit...,5,FIVE,
33,33,https://arxiv.org/pdf/2201.11903,Scientific Report,3.4 Robustness of Chain of Thought,How many annotators provided independent chain...,3,THREE,
34,34,https://arxiv.org/pdf/2201.11903,Scientific Report,3.2 Results,How many random samples were examined to under...,100,50,
37,37,https://github.com/mozilla-ai/structured-qa/re...,Board Game,CARD AND TILE EFFECTS,How many different races are there?,6,I NEED MORE INFO,
42,42,https://github.com/mozilla-ai/structured-qa/re...,Board Game,CARD AND TILE COSTS,Can a player pay coins to compensate for missi...,YES,NO,
45,45,https://github.com/mozilla-ai/structured-qa/re...,Board Game,CARD AND TILE EFFECTS,Which type of cards provide coins? -A: Gray -B...,B,I NEED MORE INFO,
57,57,https://github.com/mozilla-ai/structured-qa/re...,Board Game,CLEANUP PHASE,Is there a cleanup phase in the final round?,NO,YES,


In [19]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.8640776699029126