# News Search as Data Source

This notebook demonstrates how to use news search (based on Google News) as a data source for generating forecasting questions. This provides accesss to recent news articles, and filtering options.

In [1]:
%pip install -e ..
%pip install dotenv

from IPython.display import clear_output
clear_output()

In [2]:
import os
from dotenv import load_dotenv
from lightningrod import LightningRod

load_dotenv()

api_key = os.getenv("LIGHTNINGROD_API_KEY")
base_url = os.getenv("LIGHTNINGROD_BASE_URL", "https://api.lightningrod.ai/api/public/v1")

if not api_key:
    raise ValueError("LIGHTNINGROD_API_KEY is not set")

# Note: base_url param can be omitted
client = LightningRod(api_key=api_key, base_url=base_url)

## Configure News Seed Generator

The `NewsSeedGenerator` searches Google News for articles matching your query. You can specify date ranges, search queries, and how many articles to fetch per interval.

In [3]:
from datetime import datetime
from lightningrod import NewsSeedGenerator

seed_generator = NewsSeedGenerator(
    start_date=datetime(2025, 1, 1),
    end_date=datetime(2025, 1, 31),
    interval_duration_days=7,
    search_query="AI technology announcements",
    articles_per_search=20,
)

## Configure Question Generator

The question generator creates forecasting questions from the news articles. Use examples and bad_examples to guide the model, and FilterCriteria to ensure quality.

In [4]:
from lightningrod import AnswerType, AnswerTypeEnum, QuestionGenerator, WebSearchLabeler, QuestionRenderer, QuestionPipeline

answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions=(
        "Generate forward-looking questions about AI technology announcements. "
        "Questions should be about future events or outcomes that can be verified later."
    ),
    examples=[
        "Will OpenAI release a new model in Q2 2025?",
        "Will Google announce a new AI product this month?",
        "Will Apple integrate AI features into iOS 19?",
    ],
    bad_examples=[
        "What did OpenAI announce?",
        "Who is the CEO of Google?",
        "When was ChatGPT released?",
    ],
    answer_type=answer_type,
)

# Labeler automatically finds answers to questions using web search
labeler = WebSearchLabeler(
    answer_type=answer_type,
    confidence_threshold=0.5,
)

# Renderer formats the question output
renderer = QuestionRenderer(
    answer_type=answer_type,
)

pipeline_config = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
    renderer=renderer,
)

## Run the Pipeline

Combine all components into a QuestionPipeline and run it to generate your dataset.

In [6]:
dataset = client.transforms.run(pipeline_config, max_questions=10) # keep max questions low when testing

> Note: This can take a few minutes to complete processing.

## View Results

Inspect the generated questions and answers. Each sample contains `seed`, `question`, `label`, `prompt`, and optional `context` and `meta` fields. See [API.md](../API.md) for the complete sample structure.

In [None]:
%pip install pandas

from IPython.display import clear_output
clear_output()

In [7]:
import pandas as pd

# Download samples to memory
samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

# Convert cached samples to a list of dictionaries
rows = dataset.flattened()

df = pd.DataFrame(rows)
df

Generated 12 samples



Unnamed: 0,question.question_text,label.label,label.label_confidence,label.resolution_date,label.reasoning,label.answer_sources,prompt,seed.seed_text,seed.url,seed.seed_creation_date,seed.search_query,is_valid,meta.sample_id,meta.parent_sample_id,meta.processing_time_ms,meta.filter_reason
0,Will the Asian Journal of Law and Society publ...,1,1.0,2025-01-15T00:00:00,The Asian Journal of Law and Society is defini...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the Asian Journal of Law and S...,Title: Special Issue on AI Sovereignty and Int...,https://www.cambridge.org/core/journals/asian-...,2025-01-31T00:00:00,AI technology announcements,True,45320d89-977d-4f29-9254-97c925a9554b,e0cd79f4-a517-4201-b7c8-41a8160adbc1,9390.237,
1,Will Oracle announce a general availability re...,1,0.9,2025-10-14T00:00:00,Oracle made significant announcements regardin...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Oracle announce a general avai...,Title: Oracle debuts new AI agents as artifici...,https://finance.yahoo.com/news/oracle-debuts-n...,2025-01-30T00:00:00,AI technology announcements,True,c5546940-d9a0-40d0-bc9a-14dc363defde,4938598a-c929-4918-be2c-820e00b0bf08,16775.201,
2,Will Microsoft release Project Spark to the ge...,0,1.0,2016-05-13T00:00:00,Microsoft officially released Project Spark on...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Microsoft release Project Spar...,Title: Advancing education to prepare for an A...,https://www.microsoft.com/en-us/education/blog...,2025-01-30T00:00:00,AI technology announcements,True,64d0a7e6-48b7-404d-8dce-d2b7a233e962,e1094f33-c976-4e7a-958b-b556359f7e42,8457.427,
3,Will Alibaba's Qwen2.5-Max model reach the #1 ...,0,0.9,2025-02-06T00:00:00,Alibaba's Qwen2.5-Max model did not reach the ...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Alibaba's Qwen2.5-Max model re...,Title: Alibaba claims its AI model trounces De...,https://www.livescience.com/technology/artific...,2025-01-29T00:00:00,AI technology announcements,True,75489412-fa2c-49ec-808a-653b9fda038b,87b1a593-d9d6-42d2-a441-18070ca54955,18403.397,
4,Will NYCEDC successfully place CUNY students i...,1,1.0,2025-08-27T00:00:00,The New York City Economic Corporation (NYCEDC...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill NYCEDC successfully place CUNY...,"Title: Mayor Adams, NYCEDC Release First-Of-It...",https://www.nyc.gov/mayors-office/news/2025/01...,2025-01-31T00:00:00,AI technology announcements,True,f4e2353c-3700-4d1a-b6eb-98f80c7d6d95,1f2c3445-7cc1-42b5-b327-f1384c6f2906,10140.285,
5,Will Meta end the year 2025 with at least 1.3 ...,Undetermined,0.6,,"Mark Zuckerberg, CEO of Meta, stated in a Face...",https://vertexaisearch.cloud.google.com/ground...,,Title: Meta Plans Record $65bn AI Investment a...,https://technologymagazine.com/articles/metas-...,2025-01-30T00:00:00,AI technology announcements,False,252038e1-9ca6-4283-89c9-de933cba3270,c450a2f2-1742-44b3-ba1d-eba87fb8ca00,19346.681,Undetermined label
6,Will a third-party audit or independent resear...,0,0.9,2025-01-28T00:00:00,DeepSeek itself claimed a training cost of app...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill a third-party audit or indepen...,Title: China's DeepSeek faces questions over c...,https://www.aljazeera.com/news/2025/1/29/ai-ga...,2025-01-29T00:00:00,AI technology announcements,True,31e28292-78fd-4a2e-a7e7-a1e26f2430c8,5cfe0d04-657a-4bd8-8949-12a57ded5b6f,17161.596,
7,Will VideaHealth announce the official release...,1,1.0,2025-07-30T00:00:00,VideaHealth officially announced the launch of...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill VideaHealth announce the offic...,Title: VideaHealth Raises $40M in Oversubscrib...,https://www.businesswire.com/news/home/2025012...,2025-01-29T00:00:00,AI technology announcements,True,9da80cde-dfdd-4449-871e-2b0ea4cb6830,a62c404e-cac4-433e-af3a-7fe90ef965ba,6583.71,
8,Will DeepSeek release a new open-source large ...,Undetermined,0.4,,The question asks whether DeepSeek released a ...,https://vertexaisearch.cloud.google.com/ground...,,Title: Access to this page has been denied.\n\...,https://www.investors.com/news/technology/ai-s...,2025-01-30T00:00:00,AI technology announcements,False,2c7ff91b-dd4f-46dd-ab33-a8032a8283af,4b1d67c3-2c0a-48f5-b9f7-54671e3e645a,30137.668,Undetermined label
9,"Will Arteria AI's research arm, Arteria Café, ...",Undetermined,0.7,,Arteria AI announced the launch of its dedicat...,https://vertexaisearch.cloud.google.com/ground...,,Title: Arteria AI Launches New Research Arm to...,https://www.businesswire.com/news/home/2025012...,2025-01-29T00:00:00,AI technology announcements,False,5a108ab6-6c3e-44bc-94bd-c7707f937e5b,09f0c491-72b1-405d-b48c-e5a4e08e83b7,12470.074,Undetermined label
