# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [12]:
!git clone --single-branch --branch 5-add-benchmark https://github.com/mozilla-ai/structured-qa

fatal: destination path 'structured-qa' already exists and is not an empty directory.


In [13]:
%pip install ./structured-qa

Processing ./structured-qa
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: structured-qa
  Building wheel for structured-qa (pyproject.toml) ... [?25l[?25hdone
  Created wheel for structured-qa: filename=structured_qa-0.3.3.dev111+g97049d6-py3-none-any.whl size=13247 sha256=a18780844c04a51ee112c6177e9ed610585c15d00f2e5f2dfefa1dcd4d27f151
  Stored in directory: /root/.cache/pip/wheels/b8/d1/8b/1585580e7787d68790745653775eb485d52a0d5386b616c827
Successfully built structured-qa
Installing collected packages: structured-qa
  Attempting uninstall: structured-qa
    Found existing installation: structured-qa 0.3.3.dev111+g97049d6
    Uninstalling structured-qa-0.3.3.dev111+g97049d6:
      Successfully uninstalled structured-qa-0.3.3.dev111+g97049d6
Successfully installed structured-qa-0.3.3.dev111+g97049d6


# Setup

In [14]:
import os
import google.generativeai as genai
from google.colab.userdata import get, SecretNotFoundError

try:
    genai.configure(api_key=get("GOOGLE_API_KEY"))
except SecretNotFoundError as e:
    raise RuntimeError("Please set the GOOGLE_API_KEY secret to your API key") from e
os.environ["LOGURU_LEVEL"] = "INFO"

In [15]:
from loguru import logger

## Function to Process all questions for a single Section

In [16]:
import json
import time


def process_section_questions(
    section_file,
    section_data,
    model,
):
    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in section_data.iterrows():
        if model.n > 0 and model.n % 10 == 0:
            logger.info("Waiting for 60 seconds")
            time.sleep(60)
        question = row["question"]
        logger.info(f"Question: {question}")
        response = model.model.generate_content([section_file.read_text(), question])
        answers[index] = response.text
        sections[index] = None
        model.n += 1
    return answers, sections

## Load Model

In [17]:
from structured_qa.model_loaders import load_gemini_model

In [18]:
SYSTEM_PROMPT = """
You are a rigorous assistant answering questions.
You must only answer based on the current information available which is:

```
{CURRENT_INFO}
```

If the current information available not enough to answer the question,
you must return "I need more info" srting and nothing else:

If the current information is enough to answer, you must return one of the following formats:
- YES/NO (for boolean questions)
- Number (for numeric questions)
- Single letter (for multiple-choice questions)
"""

In [19]:
model = load_gemini_model("gemini-2.0-flash-exp", system_prompt=SYSTEM_PROMPT)
model.n = 0

# Run Benchmark

In [20]:
from pathlib import Path

import pandas as pd


logger.info("Loading input data")
data = pd.read_csv("structured-qa/benchmark/structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for section_name, section_data in data.groupby("section"):
    section_file = Path(f"structured-qa/benchmark/perfect_context/{section_name}.txt")

    answers, sections = process_section_questions(section_file, section_data, model)

    for index in section_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-02-04 15:05:23.818[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m6[0m - [1mLoading input data[0m
[32m2025-02-04 15:05:23.835[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m10[0m - [1mPredicting[0m
[32m2025-02-04 15:05:23.838[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m18[0m - [1mQuestion: In billions, how many trainable parameters does GPT-3 have?[0m
[32m2025-02-04 15:05:25.515[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m18[0m - [1mQuestion: Does LoRA introduce additional inference latency compared to full fine-tuning?[0m
[32m2025-02-04 15:05:26.763[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m10[0m - [1mPredicting[0m
[32m2025-02-04 15:05:26.765[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m18[0m - [1mQuestion: What fire resistance must vertic

# Results

In [23]:
results = pd.read_csv("results.csv")
for index, result in results.iterrows():
    results.loc[index, "pred_answer"] = result["pred_answer"].strip()
    if result["pred_answer"].startswith(
        (f"-{result['answer']}", f"{result['answer']}")
    ):
        results.loc[index, "pred_answer"] = result["answer"]
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,type,section,question,answer,pred_answer,pred_section
10,10,https://arxiv.org/pdf/1706.03762,Scientific Paper,5.4 Regularization,What was the dropout rate used for the base mo...,0.1,0. 1,
51,51,https://github.com/mozilla-ai/structured-qa/re...,Board Game,LOOKOUT PHASE,Is there a limit to the number of cards a play...,NO,YES,
92,92,https://arxiv.org/pdf/2302.13971,Scientific Report,2.3 Optimizer,What value was used for the weight decay?,0.1,0. 1,
94,94,https://arxiv.org/pdf/2302.13971,Scientific Report,3 Main results,Was the model compared against GPT-4?,NO,I NEED MORE INFO,


In [24]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.9611650485436893