# Quick Start Guide

Generate forecasting datasets from news articles in minutes.

Lightning Rod automatically creates training data by:
1. Collecting news articles from a time period
2. Generating forecasting questions from those articles
3. Finding the answers automatically using web search
4. Returning a dataset ready for model training

This uses the "future as label" approach: we generate questions about future events, then use what actually happened as the ground truth labels.

## Install the SDK

In [1]:
%pip install -e ..

from IPython.display import clear_output
clear_output()

## Set up the client

Set your API key as an environment variable `LIGHTNINGROD_API_KEY` or pass it directly to the constructor.

In [None]:
import os
from lightningrod import LightningRod

api_key = os.getenv("LIGHTNINGROD_API_KEY")
if not api_key:
    raise ValueError("LIGHTNINGROD_API_KEY is not set")

client = LightningRod(api_key=api_key)

Client initialized with base URL: https://lightningrod-api-staging-918054920018.us-central1.run.app/api/public/v1


## Build a pipeline

A pipeline has three components:

1. **Seed Generator** - Collects news articles from a time period
2. **Question Generator** - Creates forecasting questions from the articles
3. **Labeler** - Finds the answers automatically using web search

Let's build a simple pipeline:

In [None]:
from datetime import datetime, timedelta
from lightningrod import (
    NewsSeedGenerator,
    QuestionGenerator,
    WebSearchLabeler,
    QuestionPipeline,
    AnswerType,
    AnswerTypeEnum,
)

seed_generator = NewsSeedGenerator(
    start_date=datetime.now() - timedelta(days=30),
    end_date=datetime.now(),
    search_query="technology announcements",
)

question_generator = QuestionGenerator(
    instructions="Generate forward-looking questions about technology announcements.",
    answer_type=AnswerType(answer_type=AnswerTypeEnum.BINARY),
)

labeler = WebSearchLabeler(answer_type=AnswerType(answer_type=AnswerTypeEnum.BINARY))

pipeline = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
)

## Run the pipeline

This will collect news articles, generate questions, and find answers. The `max_questions` parameter limits how many questions to generate (useful for testing).

In [4]:
dataset = client.transforms.run(pipeline, max_questions=10)

## View the results

Each sample in the dataset contains:
- The original news article
- A forecasting question generated from it
- The answer (found via web search) with confidence score
- A formatted prompt ready for model training

In [5]:

samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

samples

Generated 10 samples





View results as a data frame:

In [6]:
%pip install pandas

from IPython.display import clear_output
clear_output()

import pandas as pd

rows = dataset.flattened()
df = pd.DataFrame(rows)

print("Sample questions and answers:")
print(df[["question.question_text", "label.label", "label.label_confidence"]].head())

Sample questions and answers:
                              question.question_text   label.label  \
0  Will Wake Tech engineering and welding student...             1   
1  Will IBM Sovereign Core reach full general ava...             1   
2  Will Intel report its fourth-quarter earnings ...             1   
3  Will Apple release an AI-powered version of Si...             1   
4  Will Rongta Technology successfully complete t...  Undetermined   

   label.label_confidence  
0                     0.9  
1                     0.9  
2                     1.0  
3                     1.0  
4                     1.0  


## Next steps

- **Different data sources**: See examples 02-04 for GDELT, custom documents, and more
- **Different question types**: See examples 05-08 for continuous, multiple choice, and free response questions
- **Full API reference**: See [API.md](../API.md) for all options and configurations