# Google News as Data Source

This notebook demonstrates how to use Google News search as a data source for generating forecasting questions. Google News provides access to recent news articles that can be used as seeds for question generation.

In [None]:
%pip install lightningrod-ai

from IPython.display import clear_output
clear_output()

In [None]:
import os
from datetime import datetime, timedelta
from lightningrod import (
    LightningRod,
    NewsSeedGenerator,
    QuestionGenerator,
    QuestionPipeline,
    WebSearchLabeler,
    FilterCriteria,
    AnswerType,
    AnswerTypeEnum,
)
from lightningrod._generated.models import QuestionRenderer

api_key = os.getenv("LIGHTNINGROD_API_KEY", "your-api-key-here")

client = LightningRod(api_key=api_key)

## Configure News Seed Generator

The `NewsSeedGenerator` searches Google News for articles matching your query. You can specify date ranges, search queries, and how many articles to fetch per interval.

In [None]:
seed_generator = NewsSeedGenerator(
    start_date=datetime(2025, 1, 1),
    end_date=datetime(2025, 1, 31),
    interval_duration_days=7,
    search_query="AI technology announcements",
    articles_per_search=20,
)

## Configure Question Generator

The question generator creates forecasting questions from the news articles. Use examples and bad_examples to guide the model, and FilterCriteria to ensure quality.

In [None]:
answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions=(
        "Generate forward-looking questions about AI technology announcements. "
        "Questions should be about future events or outcomes that can be verified later."
    ),
    examples=[
        "Will OpenAI release a new model in Q2 2025?",
        "Will Google announce a new AI product this month?",
        "Will Apple integrate AI features into iOS 19?",
    ],
    bad_examples=[
        "What did OpenAI announce?",
        "Who is the CEO of Google?",
        "When was ChatGPT released?",
    ],
    filter_=FilterCriteria(
        rubric="The question should be forward-looking and about future AI technology events",
        min_score=0.7
    ),
    answer_type=answer_type,
)

## Configure Labeler and Renderer

The labeler automatically finds answers to questions using web search, and the renderer formats the final prompt.

In [None]:
labeler = WebSearchLabeler(
    answer_type=answer_type,
    confidence_threshold=0.5,
)

renderer = QuestionRenderer(
    answer_type=answer_type,
)

## Run the Pipeline

Combine all components into a QuestionPipeline and run it to generate your dataset.

In [None]:
pipeline_config = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
    renderer=renderer,
)

dataset = client.transforms.run(pipeline_config, max_questions=50)

## Inspect Generated Samples

Examine the generated questions and labels to verify quality.

In [None]:
print(f"Generated dataset with {dataset.num_rows} samples\n")

samples = dataset.to_samples()
for i, sample in enumerate(samples[:5]):
    print(f"Sample {i+1}:")
    if sample.seed:
        print(f"  Seed: {sample.seed.seed_text[:100]}...")
    if sample.question:
        print(f"  Question: {sample.question.question_text}")
    if sample.label:
        print(f"  Answer: {sample.label.label}")
        print(f"  Confidence: {sample.label.label_confidence:.2f}")
    print()