# Assertion Generation for Existing Questions

This notebook demonstrates how to generate assertions for existing data-local and data-global questions that were previously generated without assertions (e.g., when `max_assertions=0` was used or assertions were disabled during question generation).

This is useful when you want to retroactively add assertion-based evaluation capabilities to existing question sets.

In [None]:
# Copyright (c) 2025 Microsoft Corporation.

import sys

sys.path.insert(1, "../../../")

In [None]:
%load_ext dotenv
%dotenv

In [None]:
import logging
import os
from dataclasses import asdict

from pydantic import SecretStr

from benchmark_qed.autoq.io.question import (
    load_questions,
    save_questions,
)
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen import (
    AssertionValidator,
)
from benchmark_qed.config.llm_config import LLMConfig, LLMProvider
from benchmark_qed.llm.factory import ModelFactory

logging.basicConfig(level=logging.INFO)

if logging.getLogger("httpx") is not None:
    logging.getLogger("httpx").setLevel(logging.ERROR)

## Configuration

In [None]:
# DATA CONFIGS
OUTPUT_QUESTIONS_PATH = "../../output/AP_news/questions"

# MODEL CONFIGS
API_KEY = SecretStr(os.getenv("OPENAI_API_KEY", ""))
LLM_MODEL = "gpt-4.1"
LLM_PARAMS = {
    "temperature": 0.0,
    "seed": 42,
}  # adjust this based on your model. For example, some reasoning models do not support temperature settings
CONCURRENT_REQUESTS = (
    8  # Control for request concurrency. Adjust this based on your model capacity.
)

# ASSERTION GENERATION CONFIGS
MAX_ASSERTIONS = 20  # Maximum number of assertions per question
BATCH_SIZE = 100  # Batch size for processing claims in global assertion generation
MAX_DATA_TOKENS = (
    32000  # Maximum input data tokens for the reduce step in global assertions
)
ENABLE_VALIDATION = True  # Set to True to validate assertions against sources
MIN_VALIDATION_SCORE = 3  # Minimum score (1-5) for grounding, relevance, verifiability

llm = ModelFactory.create_chat_model(
    model_config=LLMConfig(
        model=LLM_MODEL,
        api_key=API_KEY,
        llm_provider=LLMProvider.OpenAIChat,
        call_args=LLM_PARAMS,
    )
)

# Create validator if validation is enabled
# Validator checks assertions for:
# - Grounding: Is the assertion supported by source texts?
# - Relevance: Is it useful for evaluating the question?
# - Verifiability: Is it clear and testable?
validator = (
    AssertionValidator(
        llm=llm,
        llm_params=LLM_PARAMS,
        min_criterion_score=MIN_VALIDATION_SCORE,
        concurrent_validations=CONCURRENT_REQUESTS,
    )
    if ENABLE_VALIDATION
    else None
)

if validator:
    print(f"Validation enabled (min score: {MIN_VALIDATION_SCORE}/5)")
else:
    print("Validation disabled")

## Generate Assertions for Existing Data-Local Questions

Generate assertions for data-local questions that were previously created without assertions.

In [None]:
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.local_claim_assertion_gen import (
    LocalClaimAssertionGenerator,
)

# Load existing data-local questions from disk
# Replace with your actual path to existing questions
existing_local_questions = load_questions(
    f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/selected_questions.json"
)

print(f"Loaded {len(existing_local_questions)} existing data-local questions")

# Initialize local assertion generator with optional validator
local_assertion_generator = LocalClaimAssertionGenerator(
    llm=llm,
    max_assertions=MAX_ASSERTIONS,
    validator=validator,  # Pass validator for quality filtering (None to skip)
)

# Generate assertions for each question
updated_local_questions = []
for question in existing_local_questions:
    # Check if question has claims to generate assertions from
    if not (
        hasattr(question, "attributes")
        and question.attributes
        and "claims" in question.attributes
    ):
        print(f"Question {question.id} has no claims, skipping assertion generation...")
        updated_local_questions.append(question)
        continue

    try:
        # Generate assertions from the question's claims
        claims = question.attributes["claims"]
        assertion_result = await local_assertion_generator.agenerate_assertions(
            question_text=question.text, claims=claims
        )

        # Update question with generated assertions
        if not question.attributes:
            question.attributes = {}
        question.attributes["assertions"] = [
            asdict(assertion) for assertion in assertion_result.assertions
        ]

        print(
            f"Generated {len(assertion_result.assertions)} assertions for question: {question.text[:100]}..."
        )

    except Exception as e:
        print(f"Failed to generate assertions for question {question.id}: {e!s}")

    updated_local_questions.append(question)

# Save updated questions with assertions
save_questions(
    updated_local_questions,
    f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/",
    "selected_questions_with_assertions",
)

print(f"Saved {len(updated_local_questions)} data-local questions with assertions")

## Generate Assertions for Existing Data-Global Questions

Generate assertions for data-global questions that were previously created without assertions. Global assertion generation uses a map-reduce approach, first generating local assertions from referenced questions, then consolidating them into global assertions.

In [None]:
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen import (
    GlobalClaimAssertionGenerator,
)

# Load existing data-global questions from disk
# Replace with your actual path to existing questions
existing_global_questions = load_questions(
    f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/selected_questions.json"
)

print(f"Loaded {len(existing_global_questions)} existing data-global questions")

# Initialize global assertion generator with optional validator
global_assertion_generator = GlobalClaimAssertionGenerator(
    llm=llm,
    max_assertions=MAX_ASSERTIONS,
    batch_size=BATCH_SIZE,  # Batch size for processing multiple claims
    max_data_tokens=MAX_DATA_TOKENS,  # max input data tokens for the reduce step
    concurrent_coroutines=CONCURRENT_REQUESTS,
    validator=validator,  # Pass validator for quality filtering (None to skip)
)

# Generate assertions for each global question
updated_global_questions = []
for question in existing_global_questions:
    # Check if question has claims to generate assertions from
    if not (
        hasattr(question, "attributes")
        and question.attributes
        and "claims" in question.attributes
    ):
        print(
            f"Global question {question.id} has no claims, skipping assertion generation..."
        )
        updated_global_questions.append(question)
        continue

    try:
        claims = question.attributes["claims"]

        # Generate assertions using the map-reduce approach
        assertion_result = await global_assertion_generator.agenerate_assertions(
            question_text=question.text, claims=claims
        )

        # Update question with generated assertions
        if not question.attributes:
            question.attributes = {}
        question.attributes["assertions"] = [
            asdict(assertion) for assertion in assertion_result.assertions
        ]

        print(
            f"Generated {len(assertion_result.assertions)} assertions for global question: {question.text[:100]}..."
        )

    except Exception as e:
        print(f"Failed to generate assertions for global question {question.id}: {e!s}")

    updated_global_questions.append(question)

# Save updated questions with assertions
save_questions(
    updated_global_questions,
    f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/",
    "selected_questions_with_assertions",
)

print(f"Saved {len(updated_global_questions)} data-global questions with assertions")

## Notes on Assertion Generation

**When to use this approach:**
- You have existing questions that were generated with `max_assertions=0` or without assertion generation
- You want to add evaluation capabilities to previously generated question sets
- You need to regenerate assertions with different parameters or improved prompts

**Input Requirements:**
- Questions must have `claims` in their `attributes` field
- For data-local questions: claims should be a list of claim dictionaries
- For data-global questions: claims can be in various formats (simple or complex)

**Output Format:**
- Assertions are added to the question's `attributes.assertions` field
- Each assertion contains a `statement` that can be used for evaluation
- Questions without valid claims are left unchanged

**Configuration Options:**
- `MAX_ASSERTIONS`: Maximum number of assertions to generate per question (default: 20)
- `ENABLE_VALIDATION`: Set to `True` to validate assertions for quality (default: True)
- `MIN_VALIDATION_SCORE`: Minimum score (1-5) for validation criteria (default: 3)
- `BATCH_SIZE`: For global questions, controls how many claims are processed together 
-`MAX_DATA_TOKENS`: For global questions, controls the max input data tokens in the reduce step
- `CONCURRENT_COROUTINES`: Controls parallel processing for global questions

**Validation:**
When `ENABLE_VALIDATION=True`, each assertion is checked for:
- **Grounding**: Is the assertion factually supported by source texts?
- **Relevance**: Is the assertion useful for evaluating answers to the question?
- **Verifiability**: Is the assertion clear and objectively checkable?

Assertions must score at least `MIN_VALIDATION_SCORE` on all three criteria to pass validation and be included in the final assertion set.