# Structured Q&A

Source code: https://github.com/mozilla-ai/structured-qa

Docs: https://mozilla-ai.github.io/structured-qa

## Installing dependencies

In [1]:
%pip install git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark

Collecting git+https://github.com/mozilla-ai/structured-qa.git@5-add-benchmark
  Cloning https://github.com/mozilla-ai/structured-qa.git (to revision 5-add-benchmark) to /tmp/pip-req-build-nwtt45ou
  Running command git clone --filter=blob:none --quiet https://github.com/mozilla-ai/structured-qa.git /tmp/pip-req-build-nwtt45ou
  Running command git checkout -b 5-add-benchmark --track origin/5-add-benchmark
  Switched to a new branch '5-add-benchmark'
  Branch '5-add-benchmark' set up to track remote branch '5-add-benchmark' from 'origin'.
  Resolved https://github.com/mozilla-ai/structured-qa.git to commit c5ee8e63ab951b740147be2d69c2f00549043734
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [2]:
!wget https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv

--2025-02-03 14:30:33--  https://raw.githubusercontent.com/mozilla-ai/structured-qa/refs/heads/5-add-benchmark/benchmark/structured_qa.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21441 (21K) [text/plain]
Saving to: ‘structured_qa.csv.2’


2025-02-03 14:30:33 (100 MB/s) - ‘structured_qa.csv.2’ saved [21441/21441]



# Setup

In [3]:
import os
import google.generativeai as genai
from google.colab.userdata import get, SecretNotFoundError

try:
    genai.configure(api_key=get("GOOGLE_API_KEY"))
except SecretNotFoundError as e:
    raise RuntimeError("Please set the GOOGLE_API_KEY secret to your API key") from e
os.environ["LOGURU_LEVEL"] = "INFO"

In [4]:
from loguru import logger

## Function to Process all questions for a single Document

In [5]:
import json
import time


def process_document_questions(
    document_file,
    document_data,
    model,
):
    logger.info("Uploading file")
    file = genai.upload_file(document_file, mime_type="application/pdf")
    while file.state.name == "PROCESSING":
        logger.debug("Waiting for file to be processed.")
        time.sleep(2)
        file = genai.get_file(file.name)

    logger.info("Predicting")
    answers = {}
    sections = {}
    for index, row in document_data.iterrows():
        if model.n > 0 and model.n % 9 == 0:
            logger.info("Waiting for 60 seconds")
            time.sleep(60)
        question = row["question"]
        logger.info(f"Question: {question}")
        try:
            response = model.model.generate_content([file, question])
        except Exception:
            response_json = json.dumps({"answer": "Error", "section": "Error"})
        logger.info(response.text)
        response_json = json.loads(response.text)
        answers[index] = response_json["answer"]
        sections[index] = response_json["section"]
        model.n += 1
    return answers, sections

## Load Model

In [6]:
from structured_qa.model_loaders import load_gemini_model

In [7]:
SYSTEM_PROMPT = """
You are a rigorous assistant answering questions.
You must only answer based on the current information available which is:

```
{CURRENT_INFO}
```

If the current information available not enough to answer the question,
you must return "I need more info" srting and nothing else:

If the current information is enough to answer, you must return one of the following formats:
- YES/NO (for boolean questions)
- Number (for numeric questions)
- Single letter (for multiple-choice questions)
"""

In [8]:
model = load_gemini_model("gemini-2.0-flash-exp", system_prompt=SYSTEM_PROMPT)
model.n = 0

# Run Benchmark

In [9]:
from pathlib import Path
from urllib.request import urlretrieve

import pandas as pd


logger.info("Loading input data")
data = pd.read_csv("structured_qa.csv")
data["pred_answer"] = [None] * len(data)
data["pred_section"] = [None] * len(data)

for document_link, document_data in data.groupby("document"):
    logger.info(f"Downloading document {document_link}")
    downloaded_document = Path(f"{Path(document_link).name}.pdf")
    if not Path(downloaded_document).exists():
        urlretrieve(document_link, downloaded_document)
        logger.info(f"Downloaded {document_link} to {downloaded_document}")
    else:
        logger.info(f"File {downloaded_document} already exists")

    answers, sections = process_document_questions(
        downloaded_document, document_data, model
    )

    for index in document_data.index:
        data.loc[index, "pred_answer"] = str(answers[index]).upper()
        data.loc[index, "pred_section"] = sections[index]

data.to_csv("results.csv")

[32m2025-02-03 14:30:36.222[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m7[0m - [1mLoading input data[0m
[32m2025-02-03 14:30:36.233[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m13[0m - [1mDownloading document https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf[0m
[32m2025-02-03 14:30:36.236[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m19[0m - [1mFile HAI_AI-Index-Report-2024.pdf.pdf already exists[0m
[32m2025-02-03 14:30:36.238[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document_questions[0m:[36m10[0m - [1mUploading file[0m
[32m2025-02-03 14:30:38.199[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document_questions[0m:[36m17[0m - [1mPredicting[0m
[32m2025-02-03 14:30:38.209[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_document_questions[0m:[36m26[0m - [1mQuestion: which type of risk was identified as the leadin

# Results

In [10]:
results = pd.read_csv("results.csv")
results.loc[results["answer"] != results["pred_answer"]]

Unnamed: 0.1,Unnamed: 0,document,section,question,answer,pred_answer,pred_section
5,5,https://arxiv.org/pdf/1706.03762,3.5 Positional Encoding,Does the final model use learned positional em...,NO,YES,6.2 Model Variations
39,39,https://github.com/mozilla-ai/structured-qa/re...,CHAPTER OVERVIEW,Can you take a Chapter card and a Landmark til...,NO,YES,Turn overview
44,44,https://github.com/mozilla-ai/structured-qa/re...,CARD AND TILE EFFECTS,Can you use a symbol more than once per turn?,NO,YES,5. CARD AND TILE EFFECTS
78,78,https://docs.nvidia.com/cuda/pdf/CUDA_C_Progra...,23.1. What is Lazy Loading?,Can you enable lazy loading by setting the env...,NO,YES,23.1. What is Lazy Loading?


In [11]:
accuracy = sum(results["answer"] == results["pred_answer"]) / len(results)
accuracy

0.9611650485436893