# Quick Start

Generate binary forecasting questions from news articles in just a few steps.

> **Tip**: See [API.md](../API.md) for complete API reference and alternative configurations.

## Step 1: Install SDK

In [3]:
%pip install -e ../

from IPython.display import clear_output
clear_output()

## Step 2: Import and Initialize Client

The `LightningRod` client provides access to transforms, datasets, files, and filesets. See [API.md](../API.md) for all available methods.

In [None]:
import os
from lightningrod import LightningRod

api_key = os.getenv("LIGHTNINGROD_API_KEY")
if not api_key:
    raise ValueError("LIGHTNINGROD_API_KEY is not set")

client = LightningRod(api_key=api_key)

## Step 3: Configure Seed Generator

Searches Google News for articles matching your query.

> **Alternatives**: Use `GdeltSeedGenerator` for large-scale datasets, or `FileSetSeedGenerator` for custom documents. See [API.md](../API.md) for details.

In [5]:
from datetime import datetime, timedelta
from lightningrod import NewsSeedGenerator

seed_generator = NewsSeedGenerator(
    start_date=datetime.now() - timedelta(days=30),
    end_date=datetime.now(),
    interval_duration_days=7,
    search_query="technology announcements",
)

## Step 4: Configure Question Generator

Creates forecasting questions from the news articles.

> **Note**: We use `BINARY` answer type here. Other types: `CONTINUOUS` (numeric), `MULTIPLE_CHOICE`, `FREE_RESPONSE`. See examples 05-08.

In [6]:
from lightningrod import QuestionGenerator, AnswerType, AnswerTypeEnum

answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions="Generate forward-looking questions about technology announcements.",
    answer_type=answer_type,
)

## Step 5: Configure Labeler and Renderer

Labeler finds answers using web search. Renderer formats the final prompt.

> **Note**: The renderer populates the `prompt` field in each sample. See [API.md](../API.md) for sample structure.

In [7]:
from lightningrod import WebSearchLabeler
from lightningrod._generated.models import QuestionRenderer

labeler = WebSearchLabeler(answer_type=answer_type)
renderer = QuestionRenderer(answer_type=answer_type)

## Step 6: Create Pipeline and Run

Combine all components and run to generate your dataset.

> **Alternative**: Use `client.transforms.submit()` to submit without waiting, then check status with `client.transforms.jobs.get(job_id)`. See [API.md](../API.md).

In [8]:
from lightningrod import QuestionPipeline

pipeline_config = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
    renderer=renderer,
)

dataset = client.transforms.run(pipeline_config, max_questions=10)

## Step 7: View Results

Inspect the generated questions and answers. Each sample contains `seed`, `question`, `label`, `prompt`, and optional `context` and `meta` fields. See [API.md](../API.md) for the complete sample structure.

In [9]:

%pip install pandas

from IPython.display import clear_output
clear_output()

In [10]:
import pandas as pd

# Download samples to memory
samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

# Convert cached samples to a list of dictionaries
rows = dataset.flattened()

df = pd.DataFrame(rows)
print(df.head())

Generated 10 samples

                              question.question_text   label.label  \
0  Will Samsung announce a specific retail availa...             0   
1  Will the ROG Swift OLED PG27UCWM support a 480...             1   
2  Will Meta integrate Manus’s autonomous agent t...             1   
3  Will Intel officially launch its Panther Lake ...             1   
4  Will Gemini 3 Flash remain the default model f...  Undetermined   

   label.label_confidence label.resolution_date  \
0                     1.0   2026-01-07T00:00:00   
1                     1.0   2026-01-04T00:00:00   
2                     1.0   2025-12-29T00:00:00   
3                     1.0   2026-01-05T00:00:00   
4                     0.9                   NaN   

                                     label.reasoning  \
0  Samsung did not announce a specific retail ava...   
1  The ASUS ROG Swift OLED PG27UCWM monitor expli...   
2  Meta announced its acquisition of Manus in Dec...   
3  Multiple sources confir

In [11]:
import pandas as pd

# Download samples to memory
samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

# Convert cached samples to a list of dictionaries
rows = dataset.flattened()

df = pd.DataFrame(rows)
print(df.head())

Generated 10 samples

                              question.question_text   label.label  \
0  Will Samsung announce a specific retail availa...             0   
1  Will the ROG Swift OLED PG27UCWM support a 480...             1   
2  Will Meta integrate Manus’s autonomous agent t...             1   
3  Will Intel officially launch its Panther Lake ...             1   
4  Will Gemini 3 Flash remain the default model f...  Undetermined   

   label.label_confidence label.resolution_date  \
0                     1.0   2026-01-07T00:00:00   
1                     1.0   2026-01-04T00:00:00   
2                     1.0   2025-12-29T00:00:00   
3                     1.0   2026-01-05T00:00:00   
4                     0.9                   NaN   

                                     label.reasoning  \
0  Samsung did not announce a specific retail ava...   
1  The ASUS ROG Swift OLED PG27UCWM monitor expli...   
2  Meta announced its acquisition of Manus in Dec...   
3  Multiple sources confir