# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [1]:
%pip install git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark

Collecting git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark
  Cloning https://github.com/mozilla-ai/structured-qa.git (to revision 5-add-benchmark) to /tmp/pip-req-build-e3shdxjv
  Running command git clone --filter=blob:none --quiet https://github.com/mozilla-ai/structured-qa.git /tmp/pip-req-build-e3shdxjv
  Running command git checkout -b 5-add-benchmark --track origin/5-add-benchmark
  Switched to a new branch '5-add-benchmark'
  Branch '5-add-benchmark' set up to track remote branch '5-add-benchmark' from 'origin'.
  Resolved https://github.com/mozilla-ai/structured-qa.git to commit d125b79bb7bfdeab751f93bac37039950fe24ce5
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting fire (from structured-qa==0.3.3.dev56+gd125b79)
  Downloading fire-0.7.0.tar.gz (87 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.2/87.2 k

In [2]:
!wget https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv

--2025-01-23 10:00:12--  https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14711 (14K) [text/plain]
Saving to: ‘structured_qa.csv’


2025-01-23 10:00:19 (5.28 MB/s) - ‘structured_qa.csv’ saved [14711/14711]



# Setup

In [10]:
import os
import google.generativeai as genai

GEMINI_API_KEY = None
if not GEMINI_API_KEY:
    raise ValueError("Please set the GEMINI_API_KEY variable to your API key")
os.environ["LOGURU_LEVEL"] = "INFO"
genai.configure(api_key=GEMINI_API_KEY)

## Function to Download Document

In [None]:
from pathlib import Path
from urllib.request import urlretrieve

from loguru import logger


def download_document(url, output_file):
    if not Path(output_file).exists():
        urlretrieve(url, output_file)
        logger.info(f"Downloaded {url} to {output_file}")
    else:
        logger.info(f"File {output_file} already exists")

## Function to Process a single Document

In [14]:
import json
import time


def process_document(
    document_file,
    document_data,
    model,
):
    logger.info("Uploading file")
    file = genai.upload_file(document_file, mime_type="application/pdf")
    while file.state.name == "PROCESSING":
        logger.debug("Waiting for file to be processed.")
        time.sleep(2)
        file = genai.get_file(file.name)

    logger.info("Predicting")
    n = 0
    answers = {}
    sections = {}
    for index, row in document_data.iterrows():
        if n > 0 and n % 9 == 0:
            logger.info("Waiting for 60 seconds")
            time.sleep(60)
        question = row["question"]
        logger.info(f"Question: {question}")
        response = model.model.generate_content([file, question])
        logger.info(response.text)
        response_json = json.loads(response.text)
        answers[index] = response_json["answer"]
        sections[index] = response_json["section"]
        n += 1
    return answers, sections

## Load Model

In [None]:
from structured_qa.model_loaders import load_gemini_model

In [7]:
FULL_CONTEXT_PROMPT = """
You are given an input document and a question.
You can only answer the question based on the information in the document.
You will return a JSON name with two keys: "section" and "answer".
In `"section"`, you will return the name of the section where you found the answer.
In `"answer"`, you will return the answer one of the following JSON:
- Yes/No (for boolean questions)
Is the model an LLM?
{
  "section": "1. Introduction",
  "answer": "No"
}
- Single number (for numeric questions)
How many layers does the model have?
{
  "section": "2. Architecture",
  "answer": 12
}
- Single letter (for multiple-choice questions)
What is the activation function used in the model?
-A: ReLU
-B: Sigmoid
-C: Tanh
{
  "section": "2. Architecture",
  "answer": "C"
}
"""

In [8]:
model = load_gemini_model(
    "gemini-2.0-flash-exp",
    system_prompt=FULL_CONTEXT_PROMPT,
    generation_config={
        "response_mime_type": "application/json",
    },
)

# Run Benchmark

In [15]:
import pandas as pd


logger.info("Loading input data")
data = pd.read_csv("structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for document_link, document_data in data.groupby("document"):
    logger.info(f"Downloading document {document_link}")
    downloaded_document = Path(f"{Path(document_link).name}.pdf")
    download_document(document_link, downloaded_document)

    answers, sections = process_document(downloaded_document, document_data, model)

    for index in document_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-01-23 10:03:52.899[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m4[0m - [1mLoading input data[0m
[32m2025-01-23 10:03:52.906[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m10[0m - [1mDownloading document https://arxiv.org/pdf/1706.03762[0m
[32m2025-01-23 10:03:52.908[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m12[0m - [1mUploading file[0m
[32m2025-01-23 10:03:56.426[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m19[0m - [1mPredicting[0m
[32m2025-01-23 10:03:56.428[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m28[0m - [1mQuestion: What type of architecture does the model use? -A: decoder only -B: encoder only -C: encoder-decoder[0m
[32m2025-01-23 10:04:03.326[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document[0m:[36m30[0m - [1m{
  "section": "3 Model Architecture",
  "answer": "C"
}[0m
[32m2025-01-23 10:

# Results

In [16]:
results = pd.read_csv("results.csv")
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,section,question,answer,pred_answer,pred_section
5,5,https://arxiv.org/pdf/1706.03762,3 Model Architecture,Does the final model use learned positional em...,NO,YES,6.2 Model Variations
13,13,https://arxiv.org/pdf/2210.05189,3 Experimental Results,How many parameters are in the y = x^2 toy mod...,14,39,Table 1. Computation and memory analysis of to...
18,18,https://arxiv.org/pdf/2106.09685v2.pdf,5.5 Scaling Up to GPT-3,How much memory is saved (in GB) when training...,850,0.85,4. Practical Benefits and Limitations.
22,22,https://eur-lex.europa.eu/legal-content/EN/TXT...,Prohibited AI Practices (Article 5),Which type of AI systems are banned by the AI ...,C,B,Article 5
39,39,https://authorsalliance.org/wp-content/uploads...,Chapter 5 Where do you want to make your work ...,Are Gold Open Access and Green Open Access mut...,NO,YES,Chapter 5
74,74,https://commission.europa.eu/document/download...,Natural lighting,What is the daylight factor required for façad...,0.7,0.7%,4. VISUAL COMFORT


In [17]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.9210526315789473