# Top Aggregated News as Data Source

This notebook demonstrates how to use an aggregated dataset of top news & events (based on [GDELT](https://www.gdeltproject.org/)) as a data source for generating forecasting questions. This provides access to a massive database of global news articles, offering broader coverage than standard news search.

In [10]:
%pip install -e ..
%pip install dotenv

from IPython.display import clear_output
clear_output()

import os
from dotenv import load_dotenv
from lightningrod import LightningRod

load_dotenv()

api_key = os.getenv("LIGHTNINGROD_API_KEY")
base_url = os.getenv("LIGHTNINGROD_BASE_URL", "https://api.lightningrod.ai/api/public/v1")

if not api_key:
    raise ValueError("LIGHTNINGROD_API_KEY is not set")

# Note: base_url param can be omitted
client = LightningRod(api_key=api_key, base_url=base_url)

## Configure GDELT Seed Generator


The `GdeltSeedGenerator` fetches articles at intervals defined by `interval_duration_days` - it does not fetch articles for every day (unless you set `interval_duration_days=1`), but instead steps forward by the specified interval between each batch.

In [11]:
from datetime import datetime
from lightningrod import GdeltSeedGenerator, AnswerType, AnswerTypeEnum, QuestionGenerator, FilterCriteria, WebSearchLabeler, QuestionRenderer, QuestionPipeline

gdelt_seed_generator = GdeltSeedGenerator(
    start_date=datetime(2025, 1, 1),
    end_date=datetime(2025, 1, 31),
    interval_duration_days=7,
    articles_per_interval=1000,
)

answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions=(
        "Generate forward-looking questions about global events and international news. "
        "Questions should focus on future outcomes that can be verified."
    ),
    examples=[
        "Will the conflict in region X escalate in the next month?",
        "Will country Y sign the trade agreement this quarter?",
        "Will the international summit achieve its stated goals?",
    ],
    bad_examples=[
        "What happened in the conflict?",
        "When was the trade agreement signed?",
        "Who attended the summit?",
    ],
    filter_=FilterCriteria(
        rubric="The question should be forward-looking and about future global events",
        min_score=0.7
    ),
    answer_type=answer_type,
)

# Labeler automatically finds answers to questions using web search
labeler = WebSearchLabeler(
    answer_type=answer_type,
    confidence_threshold=0.5,
)

# Renderer formats the question output
renderer = QuestionRenderer(
    answer_type=answer_type,
)

pipeline_config = QuestionPipeline(
    seed_generator=gdelt_seed_generator,
    question_generator=question_generator,
    labeler=labeler,
    renderer=renderer,
)

> Note: This can take a few minutes to complete processing.

## Run the Pipeline

The pipeline works the same way as with Google News - GDELT is just a different data source.

In [12]:

dataset = client.transforms.run(pipeline_config, max_questions=10) # keep max questions low when testing

In [13]:
%pip install pandas

from IPython.display import clear_output
clear_output()

In [14]:
import pandas as pd

# Download samples to memory
samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

# Convert cached samples to a list of dictionaries
rows = dataset.flattened()

df = pd.DataFrame(rows)
df

Generated 10 samples



Unnamed: 0,question.question_text,label.label,label.label_confidence,label.resolution_date,label.reasoning,label.answer_sources,prompt,seed.seed_text,seed.url,seed.seed_creation_date,is_valid,meta.sample_id,meta.filter_score,meta.parent_sample_id,meta.processing_time_ms,meta.filter_reason
0,Will the Los Angeles Fire Department announce ...,0,1.0,2025-01-15T00:00:00,"Based on the search results, neither the Eaton...",https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the Los Angeles Fire Departmen...,Title: Eaton Canyon fire: A second wind-whippe...,https://www.thehindu.com/news/international/a-...,2025-01-08T00:00:00,True,63b3a11f-3296-4fb6-a693-acf4948169bb,1.0,bba6a293-4daa-43e3-8a25-431bbc0c08e5,12049.504,
1,Will Melania Trump make an official public app...,1,1.0,2025-02-12T00:00:00,Melania Trump made an official public appearan...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Melania Trump make an official...,Title: Melania Trump's 'Nun'-inspired outfit a...,https://timesofindia.indiatimes.com/world/us/m...,2025-01-09T00:00:00,True,a3e9fb07-64dd-4517-891a-d301d47c3ac9,1.0,7ff94dbc-d30a-4e32-a677-e5c16d1c0ee5,14274.97,
2,Will Donald Trump’s conviction in the Manhatta...,Undetermined,1.0,,Donald Trump was convicted in the Manhattan hu...,https://vertexaisearch.cloud.google.com/ground...,,Title: Donald Trump sentenced to 'unconditiona...,https://www.thehindu.com/news/international/do...,2025-01-10T00:00:00,False,75b80af5-8f45-4d70-b9f2-5e78d7e153ca,1.0,5c299061-74c1-4c5c-85d3-516443ea1fe6,16349.747,Undetermined label
3,Will Donald Trump be sentenced to a term of im...,0,1.0,2025-01-10T00:00:00,Donald Trump was not sentenced to a term of im...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Donald Trump be sentenced to a...,"Title: ABC News – Breaking News, Latest News a...",https://abcnews.go.com/Politics/wireStory/trum...,2025-01-10T00:00:00,True,082c474f-5b58-4f15-be97-db5a029ec891,1.0,8ef0c4c9-349b-42f8-9558-8640a1d27dab,8920.113,
4,Will Joe Biden meet with Pope Francis in the V...,0,1.0,2025-01-08T00:00:00,Joe Biden's term as US President officially co...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill Joe Biden meet with Pope Franc...,Title: Joe Biden cancels final trip to Italy a...,https://timesofindia.indiatimes.com/world/us/j...,2025-01-09T00:00:00,True,a2a50f56-df03-4271-bade-bbdccb8e32cb,1.0,38c71eea-5569-4016-9136-83cb3820589a,17652.55,
5,Will the wildfire in the Pacific Palisades nei...,0,1.0,2025-01-15T00:00:00,"The Palisades Fire, which started on January 7...",https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the wildfire in the Pacific Pa...,Title: Fierce firestorms rage through parched ...,https://www.bignewsnetwork.com/news/274920322/...,2025-01-08T00:00:00,True,9f8a68b7-8d39-45eb-b0a9-01de4b28ceed,1.0,37263a99-89c1-46b9-ba78-793c79d8ed3b,12562.404,
6,Will the FDA finalize the rule to limit nicoti...,0,0.9,2026-01-01T00:00:00,The FDA proposed a rule to limit nicotine leve...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the FDA finalize the rule to l...,Title: FDA proposes limiting nicotine levels i...,https://abcnews.go.com/Health/fda-proposes-red...,2025-01-15T00:00:00,True,642b26be-8076-4eb1-8559-8893fa2bde6a,1.0,ec0cd1d5-5573-4f4d-90d9-382ecd431259,17491.686,
7,Will a New York appeals court overturn Donald ...,0,1.0,2025-12-31T00:00:00,Donald Trump was convicted in the hush money c...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill a New York appeals court overt...,Title: Judge sentences Trump in hush money cas...,https://economictimes.indiatimes.com/news/inte...,2025-01-10T00:00:00,True,ab61a382-f2d6-4b1a-abe8-652da6f84353,1.0,f1f9b5aa-11a7-42e4-b10a-3f28f4f0a68e,33061.534,
8,Will the U.S. Navy conduct a missing man forma...,1,1.0,2025-01-09T00:00:00,The U.S. Navy conducted a 21-aircraft missing ...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the U.S. Navy conduct a missin...,Title: Jimmy Carter state funeral service: How...,https://economictimes.indiatimes.com/news/inte...,2025-01-09T00:00:00,True,65f1666e-8517-48f9-bf67-b50285326da2,1.0,4751ad08-39b3-4149-8e7e-ede4e10e4356,8985.219,
9,Will the Palisades Fire in Southern California...,0,0.8,,The Palisades Fire in Southern California star...,https://vertexaisearch.cloud.google.com/ground...,QUESTION:\nWill the Palisades Fire in Southern...,Title: Wildfires in California: Why did these ...,https://economictimes.indiatimes.com/news/inte...,2025-01-08T00:00:00,True,3bbb452b-abc5-447f-afa3-4a22622743c8,1.0,b61c9370-5bc6-4193-8cba-473af37c0789,16148.104,


## When to use Top Aggregated News vs News Search

**Use `GdeltSeedGenerator` when:**
- You need access to a very large number of articles
- You're analyzing global or international events
- You need historical data
- You want broader coverage across many sources

**Use `NewsSeedGenerator` when:**
- You need recent, curated news articles
- You want more control over search queries
- You're working with smaller, focused datasets
- You need faster iteration on specific topics