# Narrative Optimization Testbed - Quick Start

This notebook demonstrates the core concepts and workflow of the narrative optimization research framework.

## Overview

The framework tests whether **"better stories win"** - whether narrative-driven feature engineering outperforms statistical baselines.


In [10]:
# Setup
import sys
from pathlib import Path

# Add src to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

print("✓ Imports successful")


✓ Imports successful


## 1. Load Toy Data

We use the 20newsgroups dataset as a generic testbed.


In [11]:
from src.utils.toy_data import quick_load_toy_data

# Load or generate toy dataset
data = quick_load_toy_data(data_dir='../data/toy')

X_train = data['X_train']
y_train = data['y_train']
X_test = data['X_test']
y_test = data['y_test']
target_names = data['target_names']

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Categories: {target_names}")
print(f"\nExample document (first 200 chars):\n{X_train[0][:200]}...")


Loaded existing toy dataset
Training samples: 400
Test samples: 100
Categories: ['alt.atheism', 'comp.graphics', 'sci.space', 'talk.religion.misc']

Example document (first 200 chars):
I am working on a project where we are going to be including
both still and moving grapics within a database.  Of course
JPEG and MPEG come to mind as the formats of choice for the
various files.  How...


## 2. Build a Narrative Pipeline

Let's create a simple narrative pipeline that tests whether semantic understanding helps.


In [12]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from src.transformers.semantic import SemanticNarrativeTransformer
from src.pipelines.narrative_pipeline import NarrativePipeline

# Build pipeline with narrative metadata
pipeline = NarrativePipeline(
    narrative_name="Semantic Understanding Test",
    hypothesis="Semantic embeddings capture meaning better than word counts",
    expected_outcome="Improved classification by understanding deeper meaning"
)

# Add transformation steps
pipeline.add_step(
    'semantic_features',
    SemanticNarrativeTransformer(n_components=30, n_clusters=8),
    rationale="Extract semantic embeddings and cluster membership"
)

pipeline.add_step(
    'scaler',
    StandardScaler(),
    rationale="Normalize features for classifier"
)

pipeline.add_step(
    'classifier',
    LogisticRegression(max_iter=1000, random_state=42),
    rationale="Linear classifier on semantic features"
)

# Build the sklearn Pipeline
sklearn_pipeline = pipeline.build()

print("✓ Pipeline built successfully!")
print(f"Steps: {list(sklearn_pipeline.named_steps.keys())}")


✓ Pipeline built successfully!
Steps: ['semantic_features', 'scaler', 'classifier']


## Next Steps

This is a quick introduction! For the full workflow, run:

```bash
python run_experiment.py -e 01_baseline_comparison
```

See the documentation in `docs/` for more details on creating custom transformers and experiments.
