# Quick Start Guide

Generate forecasting datasets from news articles in minutes.

Lightning Rod automatically creates training data by:
1. Collecting news articles from a time period
2. Generating forecasting questions from those articles
3. Finding the answers automatically using web search
4. Returning a dataset ready for model training

This uses the "future as label" approach: we generate questions about future events, then use what actually happened as the ground truth labels.

## Install the SDK

In [None]:
%pip install lightningrod-ai==0.1.5

from IPython.display import clear_output
clear_output()

## Set up the client

**Set your API key:**

Go to the Secrets section (ðŸ”‘ icon in left sidebar) and add a secret named `LIGHTNINGROD_API_KEY`

In [None]:
from google.colab import userdata
from lightningrod import LightningRod

api_key = userdata.get("LIGHTNINGROD_API_KEY")
lr = LightningRod(api_key=api_key)

## Build a pipeline

A pipeline has three components:

1. **Seed Generator** - Collects news articles from a time period
2. **Question Generator** - Creates forecasting questions from the articles
3. **Labeler** - Finds the answers automatically using web search

Let's build a simple pipeline:

In [None]:
from datetime import datetime, timedelta
from lightningrod import (
    NewsSeedGenerator,
    QuestionGenerator,
    WebSearchLabeler,
    QuestionPipeline,
    AnswerType,
    AnswerTypeEnum,
)

seed_generator = NewsSeedGenerator(
    start_date=datetime.now() - timedelta(days=30),
    end_date=datetime.now(),
    search_query="technology announcements",
)

answer_type = AnswerType(answer_type=AnswerTypeEnum.BINARY)

question_generator = QuestionGenerator(
    instructions="Generate forward-looking questions about technology announcements.",
    answer_type=answer_type,
)

# Labeler automatically finds answers to questions using web search
labeler = WebSearchLabeler(answer_type=answer_type)

pipeline = QuestionPipeline(
    seed_generator=seed_generator,
    question_generator=question_generator,
    labeler=labeler,
)

> Note: This can take a few minutes to complete processing.

## Run the pipeline

This will collect news articles, generate questions, and find answers. The `max_questions` parameter limits how many questions to generate (useful for testing).

In [None]:
dataset = lr.transforms.run(pipeline, max_questions=10)

## View the results

Each sample in the dataset contains:
- The original news article
- A forecasting question generated from it
- The answer (found via web search) with confidence score
- A formatted prompt ready for model training

View results as a data frame:

In [None]:
%pip install pandas

from IPython.display import clear_output
clear_output()

import pandas as pd

samples = dataset.download()
print(f"Generated {dataset.num_rows} samples\n")

rows = dataset.flattened()
df = pd.DataFrame(rows)

print("Sample questions and answers:")
df[["question.question_text", "label.label", "label.label_confidence"]].head()

Generated 10 samples

Sample questions and answers:


Unnamed: 0,question.question_text,label.label,label.label_confidence
0,Will Ring's new Fire Watch feature successfull...,Undetermined,1.0
1,Will Lego's first 'Smart Play' sets associated...,1,1.0
2,Will the newly announced Motorola Fold support...,1,1.0
3,Will the motorola signature smartphone be rele...,1,1.0
4,Will Lego's new Smart Bricks be available for ...,1,1.0


## Next steps

- **Different data sources**: See examples 02-04 for GDELT, custom documents, and more
- **Different question types**: See examples 05-08 for continuous, multiple choice, and free response questions
- **Full API reference**: See [API.md](../API.md) for all options and configurations