# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [1]:
!git clone --single-branch --branch 5-add-benchmark https://github.com/mozilla-ai/structured-qa

Cloning into 'structured-qa'...
remote: Enumerating objects: 724, done.[K
remote: Counting objects: 100% (162/162), done.[K
remote: Compressing objects: 100% (101/101), done.[K
remote: Total 724 (delta 100), reused 74 (delta 61), pack-reused 562 (from 1)[K
Receiving objects: 100% (724/724), 2.23 MiB | 6.64 MiB/s, done.
Resolving deltas: 100% (382/382), done.


In [9]:
%pip install ./structured-qa

Processing ./structured-qa
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: structured-qa
  Building wheel for structured-qa (pyproject.toml) ... [?25l[?25hdone
  Created wheel for structured-qa: filename=structured_qa-0.3.3.dev71+gae325d3-py3-none-any.whl size=16241 sha256=2defc7ec99afa5814e6713aee2aca5a6364f8b2a2b97e206aa16ccabf91113eb
  Stored in directory: /root/.cache/pip/wheels/b8/d1/8b/1585580e7787d68790745653775eb485d52a0d5386b616c827
Successfully built structured-qa
Installing collected packages: structured-qa
  Attempting uninstall: structured-qa
    Found existing installation: structured-qa 0.3.3.dev71+gae325d3
    Uninstalling structured-qa-0.3.3.dev71+gae325d3:
      Successfully uninstalled structured-qa-0.3.3.dev71+gae325d3
Successfully installed structured-qa-0.3.3.dev71+gae325d3


# Setup

In [3]:
import os
import google.generativeai as genai
from google.colab.userdata import get, SecretNotFoundError

try:
    genai.configure(api_key=get("GOOGLE_API_KEY"))
except SecretNotFoundError as e:
    raise RuntimeError("Please set the GOOGLE_API_KEY secret to your API key") from e
os.environ["LOGURU_LEVEL"] = "INFO"

In [4]:
from loguru import logger

## Function to Process all questions for a single Section

In [5]:
import json
import time


def process_section_questions(
    section_file,
    section_data,
    model,
):
    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in section_data.iterrows():
        if model.n > 0 and model.n % 10 == 0:
            logger.info("Waiting for 60 seconds")
            time.sleep(60)
        question = row["question"]
        logger.info(f"Question: {question}")
        response = model.model.generate_content([section_file.read_text(), question])
        logger.info(response.text)
        response_json = json.loads(response.text)
        answers[index] = response_json["answer"]
        sections[index] = None
        model.n += 1
    return answers, sections

## Load Model

In [10]:
from structured_qa.model_loaders import load_gemini_model

In [11]:
SYSTEM_PROMPT = """
You are given an input document and a question.
You can only answer the question based on the information in the document.
You will return a JSON name with one key: "answer".
In `"answer"`, you will return the answer in one of the following JSON contents:
- Yes/No (for boolean questions)
Is the model an LLM?
{
  "answer": "No"
}
- Single number (for numeric questions)
How many layers does the model have?
{
  "answer": 12
}
- Single letter (for multiple-choice questions)
What is the activation function used in the model? -A: ReLU -B: Sigmoid -C: Tanh
{
  "answer": "C"
}
"""

In [12]:
model = load_gemini_model(
    "gemini-2.0-flash-exp",
    system_prompt=SYSTEM_PROMPT,
    generation_config={
        "response_mime_type": "application/json",
    },
)
model.n = 0

# Run Benchmark

In [13]:
from pathlib import Path

import pandas as pd


logger.info("Loading input data")
data = pd.read_csv("structured-qa/benchmark/structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for section_name, section_data in data.groupby("section"):
    section_file = Path(f"structured-qa/benchmark/perfect_context/{section_name}.txt")

    answers, sections = process_section_questions(section_file, section_data, model)

    for index in section_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-01-28 14:06:32.788[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m6[0m - [1mLoading input data[0m
[32m2025-01-28 14:06:32.808[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m10[0m - [1mPredicting[0m
[32m2025-01-28 14:06:32.810[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m18[0m - [1mQuestion: In billions, how many trainable parameters does GPT-3 have?[0m
[32m2025-01-28 14:06:34.791[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m20[0m - [1m{
  "answer": 175
}[0m
[32m2025-01-28 14:06:34.792[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m18[0m - [1mQuestion: Does LoRA introduce additional inference latency compared to full fine-tuning?[0m
[32m2025-01-28 14:06:36.592[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_section_questions[0m:[36m20[0m - [1m{
  "answer": "No"
}[0m
[32m202

# Results

In [14]:
results = pd.read_csv("results.csv")
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,section,question,answer,pred_answer,pred_section
43,43,https://arxiv.org/pdf/2201.11903,3.4 Robustness of Chain of Thought,How many annotators provided independent chain...,3,2,
52,52,https://github.com/mozilla-ai/structured-qa/re...,CARD AND TILE COSTS,Can a player pay coins to compensate for missi...,YES,NO,


In [15]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.9797979797979798