# Google News as Data Source

This notebook demonstrates how to use Google News search as a data source for generating forecasting questions. Google News provides access to recent news articles that can be used as seeds for question generation.

In [None]:
%pip install lightningrod-ai

from IPython.display import clear_output
clear_output()

In [None]:
import os
from lightningrod import LightningRod

api_key = os.getenv("LIGHTNINGROD_API_KEY")
if not api_key:
    raise ValueError("LIGHTNINGROD_API_KEY is not set")

client = LightningRod(api_key=api_key)

## Configure News Seed Generator

The `NewsSeedGenerator` searches Google News for articles matching your query. You can specify date ranges, search queries, and how many articles to fetch per interval.

In [None]:
from datetime import datetime, timedelta
from lightningrod import NewsSeedGenerator

seed_generator = NewsSeedGenerator(
    start_date=datetime(2025, 1, 1),
    end_date=datetime(2025, 1, 31),
    interval_duration_days=7,
    search_query="AI technology announcements",
    articles_per_search=20,
)

## Configure Question Generator

The question generator creates forecasting questions from the news articles. Use examples and bad_examples to guide the model, and FilterCriteria to ensure quality.

In [None]:
from lightningrod import AnswerType, AnswerTypeEnum, QuestionGenerator, FilterCriteria

answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions=(
        "Generate forward-looking questions about AI technology announcements. "
        "Questions should be about future events or outcomes that can be verified later."
    ),
    examples=[
        "Will OpenAI release a new model in Q2 2025?",
        "Will Google announce a new AI product this month?",
        "Will Apple integrate AI features into iOS 19?",
    ],
    bad_examples=[
        "What did OpenAI announce?",
        "Who is the CEO of Google?",
        "When was ChatGPT released?",
    ],
    filter_=FilterCriteria(
        rubric="The question should be forward-looking and about future AI technology events",
        min_score=0.7
    ),
    answer_type=answer_type,
)

## Configure Labeler and Renderer

The labeler automatically finds answers to questions using web search, and the renderer formats the final prompt.

In [None]:
from lightningrod import WebSearchLabeler, QuestionRenderer

labeler = WebSearchLabeler(
    answer_type=answer_type,
    confidence_threshold=0.5,
)

renderer = QuestionRenderer(
    answer_type=answer_type,
)

## Run the Pipeline

Combine all components into a QuestionPipeline and run it to generate your dataset.

In [None]:
from lightningrod import QuestionPipeline

pipeline_config = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
    renderer=renderer,
)

dataset = client.transforms.run(pipeline_config, max_questions=50)

## View Results

Inspect the generated questions and answers. Each sample contains `seed`, `question`, `label`, `prompt`, and optional `context` and `meta` fields. See [API.md](../API.md) for the complete sample structure.

In [None]:
%pip install pandas

from IPython.display import clear_output
clear_output()

In [None]:
import pandas as pd

# Download samples to memory
samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

# Convert cached samples to a list of dictionaries
rows = dataset.flattened()

df = pd.DataFrame(rows)
print(df.head())