# Assertion Generation for Existing Questions

This notebook demonstrates how to generate assertions for existing data-local and data-global questions that were previously generated without assertions (e.g., when `max_assertions=0` was used or assertions were disabled during question generation).

This is useful when you want to retroactively add assertion-based evaluation capabilities to existing question sets.

In [13]:
# Copyright (c) 2025 Microsoft Corporation.

import sys

sys.path.insert(1, "../../../")

In [14]:
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv
cannot find .env file


In [15]:
import json
import logging
import os

from dataclasses import asdict
from pydantic import SecretStr

from benchmark_qed.autoq.io.question import (
    load_questions,
    save_questions,
)
from benchmark_qed.config.llm_config import LLMConfig, LLMProvider
from benchmark_qed.llm.factory import ModelFactory

logging.basicConfig(level=logging.INFO)

if logging.getLogger("httpx") is not None:
    logging.getLogger("httpx").setLevel(logging.ERROR)

## Configuration

In [16]:
# DATA CONFIGS
OUTPUT_QUESTIONS_PATH = "../../output/AP_news/legacy_questions"

# MODEL CONFIGS
API_KEY = SecretStr(os.getenv("OPENAI_API_KEY", ""))
LLM_MODEL = "gpt-4.1"
LLM_PARAMS = {
    "temperature": 0.0,
    "seed": 42,
}  # adjust this based on your model. For example, some reasoning models do not support temperature settings
CONCURRENT_REQUESTS = (
    8  # Control for request concurrency. Adjust this based on your model capacity.
)

llm = ModelFactory.create_chat_model(
    model_config=LLMConfig(
        model=LLM_MODEL,
        api_key=API_KEY,
        llm_provider=LLMProvider.OpenAIChat,
        call_args=LLM_PARAMS,
    )
)

## Generate Assertions for Existing Data-Local Questions

Generate assertions for data-local questions that were previously created without assertions.

In [17]:
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.local_claim_assertion_gen import (
    LocalClaimAssertionGenerator,
)

# Load existing data-local questions from disk
# Replace with your actual path to existing questions
existing_local_questions = load_questions(
    f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/selected_questions.json"
)

print(f"Loaded {len(existing_local_questions)} existing data-local questions")

# Initialize local assertion generator
local_assertion_generator = LocalClaimAssertionGenerator(
    llm=llm,
    max_assertions=10,  # Maximum number of assertions per question
)

# Generate assertions for each question
updated_local_questions = []
for question in existing_local_questions:
    # Check if question has claims to generate assertions from
    if not (hasattr(question, 'attributes') and question.attributes and 'claims' in question.attributes):
        print(f"Question {question.id} has no claims, skipping assertion generation...")
        updated_local_questions.append(question)
        continue
    
    try:
        # Generate assertions from the question's claims
        claims = question.attributes['claims']
        assertion_result = await local_assertion_generator.agenerate_assertions(
            question_text=question.text,
            claims=claims
        )
        
        # Update question with generated assertions
        if not question.attributes:
            question.attributes = {}
        question.attributes['assertions'] = [asdict(assertion) for assertion in assertion_result.assertions]

        print(f"Generated {len(assertion_result.assertions)} assertions for question: {question.text[:100]}...")
        
    except Exception as e:
        print(f"Failed to generate assertions for question {question.id}: {str(e)}")
    
    updated_local_questions.append(question)

# Save updated questions with assertions
save_questions(
    updated_local_questions,
    f"{OUTPUT_QUESTIONS_PATH}/data_local_questions/",
    "selected_questions_with_assertions",
)

print(f"Saved {len(updated_local_questions)} data-local questions with assertions")

Loaded 50 existing data-local questions
Generated 2 assertions for question: What amendment was made to Article 34 of the French Constitution regarding abortion rights in March ...
Generated 6 assertions for question: What was the Florida Supreme Court's decision regarding the proposed abortion rights amendment for t...
Generated 7 assertions for question: What are the limitations of early prenatal ultrasounds and genetic screenings in detecting fetal abn...
Generated 2 assertions for question: From which countries do Mexican cartels source precursor chemicals for fentanyl production between 2...
Generated 6 assertions for question: Why are junior doctors in South Korea striking in February 2024?...
Generated 9 assertions for question: What are the contributing factors to the rise in flu and COVID-19 infections in the United States du...
Generated 6 assertions for question: What changes to the yellow flag law did Maine Governor Janet Mills propose between January 2024 and ...
Generated

## Generate Assertions for Existing Data-Global Questions

Generate assertions for data-global questions that were previously created without assertions. Global assertion generation uses a map-reduce approach, first generating local assertions from referenced questions, then consolidating them into global assertions.

In [18]:
from benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen import (
    GlobalClaimAssertionGenerator,
)

# Load existing data-global questions from disk
# Replace with your actual path to existing questions
existing_global_questions = load_questions(
    f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/selected_questions.json"
)

print(f"Loaded {len(existing_global_questions)} existing data-global questions")

# Initialize global assertion generator
global_assertion_generator = GlobalClaimAssertionGenerator(
    llm=llm,
    max_assertions=10,  # Maximum number of assertions per question
    batch_size=100,  # Batch size for processing multiple claims
    max_data_tokens=32000, # max input data tokens for the reduce step to generat the final assertions
    concurrent_coroutines=CONCURRENT_REQUESTS,
)

# Generate assertions for each global question
updated_global_questions = []
for question in existing_global_questions:
   
    # Check if question has claims to generate assertions from
    if not (hasattr(question, 'attributes') and question.attributes and 'claims' in question.attributes):
        print(f"Global question {question.id} has no claims, skipping assertion generation...")
        updated_global_questions.append(question)
        continue
    
    try:
        claims = question.attributes['claims']
        
       # Generate assertions using the map-reduce approach
        assertion_result = await global_assertion_generator.agenerate_assertions(
            question_text=question.text,
            claims=claims
        )
        
        # Update question with generated assertions
        if not question.attributes:
            question.attributes = {}
        question.attributes['assertions'] = [asdict(assertion) for assertion in assertion_result.assertions]

        print(f"Generated {len(assertion_result.assertions)} assertions for global question: {question.text[:100]}...")
        
    except Exception as e:
        print(f"Failed to generate assertions for global question {question.id}: {str(e)}")
    
    updated_global_questions.append(question)

# Save updated questions with assertions
save_questions(
    updated_global_questions,
    f"{OUTPUT_QUESTIONS_PATH}/data_global_questions/",
    "selected_questions_with_assertions",
)

print(f"Saved {len(updated_global_questions)} data-global questions with assertions")

INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP CONTEXT: Created 11 batches from 1062 simple claims
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Processing 11 batches in parallel


Loaded 50 existing data-global questions


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 11 out of 11 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 196 assertions from 11 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 196 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 196 of 196 assertions within 6520 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 196 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 196 assertions into 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.gl

Generated 10 assertions for global question: Across the dataset, what are the key legislative and policy changes impacting healthcare access and ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 9 out of 9 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 146 assertions from 9 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 146 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 146 of 146 assertions within 5172 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 146 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 146 assertions into 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.globa

Generated 10 assertions for global question: Across the dataset, what are the key societal challenges and responses observed in various regions a...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 6 out of 6 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 112 assertions from 6 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 112 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 112 of 112 assertions within 3904 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 112 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 112 assertions into 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.globa

Generated 10 assertions for global question: Across the dataset, what are the key public health challenges and the measures being taken to addres...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 6 out of 6 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 111 assertions from 6 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 111 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 111 of 111 assertions within 3742 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 111 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 111 assertions into 8
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global

Generated 8 assertions for global question: Across the dataset, what measures have been implemented to address various safety concerns and their...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 2 out of 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 29 assertions from 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 29 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 29 of 29 assertions within 1046 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 29 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 29 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim

Generated 9 assertions for global question: Across the dataset, what are the broader legal and societal implications of recent high-profile lega...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 2 out of 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 24 assertions from 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 24 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 24 of 24 assertions within 874 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 24 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 24 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the key challenges and advancements in the development and implementati...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 2 out of 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 34 assertions from 2 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 34 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 34 of 34 assertions within 1053 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 34 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 34 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim

Generated 9 assertions for global question: Across the dataset, what are the key objectives and impacts of various government policies on public...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 20 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 20 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 20 of 20 assertions within 677 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 20 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 20 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the key factors influencing mental health policies and public responses...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 18 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 18 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 18 of 18 assertions within 597 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 18 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 18 assertions into 7
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 7 assertions for global question: Across the dataset, what are the key legal and policy challenges affecting reproductive rights?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 18 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 18 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 18 of 18 assertions within 587 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 18 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 18 assertions into 8
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 8 assertions for global question: Across the dataset, what are the common causes and responses to food contamination incidents?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 528 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 7
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 7 assertions for global question: Across the dataset, what are the key ethical challenges and considerations in recent medical and leg...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 432 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 7
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 7 assertions for global question: Across the dataset, how are AI technologies being utilized to address various social and health issu...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 478 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the primary health risks identified in various regions and their associ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 17 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 17 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 17 of 17 assertions within 571 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 17 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 17 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the economic and health impacts of policy changes and interventions in ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 486 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the common impacts of security risks on critical infrastructure and nat...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 375 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the common medical conditions and treatments experienced by notable ind...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 468 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what legislative measures are being implemented to protect youth from harmful su...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 19 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 19 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 19 of 19 assertions within 620 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 19 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 19 assertions into 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim

Generated 10 assertions for global question: Across the dataset, what are the key challenges and considerations impacting the development, approv...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 529 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the key factors influencing the reporting and support of male sexual as...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 594 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 8
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 8 assertions for global question: Across the dataset, how do recent geopolitical events and policies impact international relations an...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 406 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the primary legal arguments and decisions surrounding state abortion ba...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 18 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 18 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 18 of 18 assertions within 617 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 18 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 18 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the key considerations for managing health-related expenses in the Unit...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 419 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the common patterns of human rights violations reported?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 20 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 20 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 20 of 20 assertions within 552 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 20 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 20 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the primary goals and strategies of recent economic policy proposals?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 415 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 4 assertions for global question: Across the dataset, what were the key outcomes and commitments made at the COP28 summit in Dubai reg...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 439 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the implications of using sedatives on individuals restrained by police...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 365 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the key factors influencing the production and distribution of fentanyl...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 323 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the key innovations and their purposes showcased at CES 2024 in Las Veg...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 387 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the key factors influencing the outcomes of lawsuits related to Roundup...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 12 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 12 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 12 of 12 assertions within 396 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 12 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 12 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what were the key justifications and regulatory changes implemented by governmen...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 11 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 11 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 11 of 11 assertions within 384 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 11 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 11 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 4 assertions for global question: Across the dataset, what are the key benefits of flu vaccination during the flu season?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 312 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the significant concerns and controversies associated with consumer pro...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 8 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 8 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 8 of 8 assertions within 303 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 8 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 8 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assert

Generated 4 assertions for global question: Across the dataset, what are the verified facts and contexts behind significant events and items rel...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 436 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the key factors contributing to health disparities among different popu...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 14 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 14 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 14 of 14 assertions within 475 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 14 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 14 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, how do state budget allocations impact public institutions and employee compensa...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 302 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 4 assertions for global question: Across the dataset, how does media coverage influence public perception of health-related announceme...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 13 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 13 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 13 of 13 assertions within 415 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 13 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 13 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the key factors influencing the implementation and impact of the Summer...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 472 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 9
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 9 assertions for global question: Across the dataset, what are the common health benefits associated with popular dietary choices and ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 361 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what were the key developments and regulatory actions in xenotransplantation in ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 9 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 9 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 9 of 9 assertions within 309 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 9 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 9 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assert

Generated 5 assertions for global question: Across the dataset, how do various housing policies impact affordability and discrimination issues?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 337 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 3
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 3 assertions for global question: Across the dataset, how are healthcare providers and insurers addressing challenges related to treat...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 269 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 7
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 7 assertions for global question: Across the dataset, what are the key legislative responses to mass shootings and assault weapons?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 11 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 11 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 11 of 11 assertions within 351 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 11 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 11 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the recommended strategies for managing financial stress during the hol...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 332 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 5
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 5 assertions for global question: Across the dataset, what are the impacts and proposed changes to the Medicaid estate recovery proces...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 8 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 8 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 8 of 8 assertions within 255 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 8 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 8 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assert

Generated 4 assertions for global question: Across the dataset, how do different initiatives help individuals cope with feelings of loneliness d...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 11 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 11 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 11 of 11 assertions within 372 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 11 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 11 assertions into 7
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 7 assertions for global question: Across the dataset, what are the primary factors influencing the effectiveness of humanitarian aid i...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 7 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 7 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 7 of 7 assertions within 229 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 7 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 7 assertions into 4
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assert

Generated 4 assertions for global question: Across the dataset, what are the primary health concerns and financial implications associated with ...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 15 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 15 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 15 of 15 assertions within 503 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 15 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 15 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the key strategies and benefits of implementing shade equity plans in v...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 10 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 10 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 10 of 10 assertions within 333 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 10 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 10 assertions into 6
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_

Generated 6 assertions for global question: Across the dataset, what are the key factors influencing sleep duration and quality?...


INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:MAP RESPONSES: Successfully processed 1 out of 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Merging 16 assertions from 1 batches
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Ranked 16 unique assertions by score and source count
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE CONTEXT: Selected 16 of 16 assertions within 534 tokens (limit: 32000)
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:REDUCE RESPONSE: Consolidating 16 assertions to 10
INFO:benchmark_qed.autoq.question_gen.data_questions.assertion_gen.global_claim_assertion_gen:Successfully consolidated 16 assertions into 8


Generated 8 assertions for global question: Across the dataset, what are the key strategies and challenges in improving electronic waste recycli...
Saved 50 data-global questions with assertions


## Notes on Assertion Generation

**When to use this approach:**
- You have existing questions that were generated with `max_assertions=0` or without assertion generation
- You want to add evaluation capabilities to previously generated question sets
- You need to regenerate assertions with different parameters or improved prompts

**Input Requirements:**
- Questions must have `claims` in their `attributes` field
- For data-local questions: claims should be a list of claim dictionaries
- For data-global questions: claims can be in various formats (simple or complex)

**Output Format:**
- Assertions are added to the question's `attributes.assertions` field
- Each assertion contains a `statement` that can be used for evaluation
- Questions without valid claims are left unchanged

**Configuration Options:**
- `max_assertions`: Maximum number of assertions to generate per question (default: 10)
- `batch_size`: For global questions, controls how many claims are processed together
- `concurrent_coroutines`: Controls parallel processing for global questions

**Error Handling:**
- Questions without claims are skipped with a warning message
- Individual failures are logged but don't stop the overall process