# COLLAGE Pipeline Demo

All logic lives in the `pipeline/` package. Each step follows the
**BaseStep** contract: `ingest → transform → request → validate → end`.

A single `items` list flows through the pipeline. Items are cumulative
dicts — each step's output is merged onto the item via `to_state_dict()`.
Downstream steps read prior outputs via `Model.from_state_dict(item)`.

```
load_products → room_recommendation → style_recommendation
```

| Step | Reads from item | Adds to item |
|------|----------------|--------------|
| `load_products()` | — | `Product` |
| `RoomRecommendationStep` | `Product` | `Room` |
| `StyleRecommendationStep` | `Product`, `Room` | `Style` |

In [1]:
# setup LLM API key - not needed for the real thing where we will use gcloud
import os

from dotenv import load_dotenv

load_dotenv()

if not os.getenv("ANTHROPIC_API_KEY"):
    raise EnvironmentError(
        "ANTHROPIC_API_KEY is not set. "
        "Add ANTHROPIC_API_KEY=sk-ant-... to a .env file in the repo root."
    )

print("API key loaded.")

API key loaded.


In [2]:
import logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(name)s — %(message)s")
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("httpcore").setLevel(logging.WARNING)

from pipeline import (
    LLMClient,
    RoomRecommendationStep,
    StyleRecommendationStep,
    load_products,
)

---

## Build and Run

In [3]:
llm_client = LLMClient(model="claude-3-5-haiku-20241022", max_workers=15)

steps = [
    RoomRecommendationStep(name="room_rec", llm_client=llm_client),
    StyleRecommendationStep(name="style_rec", llm_client=llm_client),
]

print(f"Pipeline: {' → '.join(s.name for s in steps)}")

Pipeline: room_rec → style_rec


In [4]:
state = load_products()

for step in steps:
    state.update(step(state))

INFO pipeline.steps.load_products — [load_products] run_id=fe68be49-cc5f-4d00-8715-1bf5a14ed9f7, 5 products loaded
INFO pipeline.steps.base_step — [room_rec] === starting ===
INFO pipeline.steps.base_step — [room_rec] ingest: 5 items, keys: ['Product']
INFO pipeline.steps.base_step — [room_rec] transform: 5 requests
INFO pipeline.steps.base_step — [room_rec] request: 5 responses
INFO pipeline.steps.base_step — [room_rec] validate: 15 outputs
INFO pipeline.steps.base_step — [room_rec] end: 15 items, keys: ['Product', 'Room']
INFO pipeline.steps.base_step — [room_rec] === done ===
INFO pipeline.steps.base_step — [style_rec] === starting ===
INFO pipeline.steps.base_step — [style_rec] ingest: 15 items, keys: ['Product', 'Room']
INFO pipeline.steps.base_step — [style_rec] transform: 15 requests
INFO pipeline.steps.base_step — [style_rec] request: 15 responses
INFO pipeline.steps.base_step — [style_rec] validate: 45 outputs
INFO pipeline.steps.base_step — [style_rec] end: 45 items, keys: ['

In [5]:
# summary of LLM usage
llm_client

LLMClient(
	model='claude-3-5-haiku-20241022',
	max_workers=15,
	requests=20,
	input_tokens=11435,
	output_tokens=8409,
	total_tokens=19844
)

In [8]:
# examine one output
from pipeline import ITEM_DATA


print(f'we got {len(state[ITEM_DATA])} results')
item = state[ITEM_DATA][0]
print("first result:")
item

we got 45 results
first result:


{'Product': {'sku': 'OAK-TBL-001',
  'name': 'Oakwood Dining Table',
  'category': 'furniture',
  'material': 'solid oak',
  'price': 1299.0},
 'Room': {'name': 'Dining Room',
  'reasoning': "The Oakwood Dining Table is perfectly designed for a dining room, given its solid oak construction and classic furniture category. Its substantial material and price point suggest it's a high-quality dining table meant to be the centerpiece of a formal or traditional dining space, providing an elegant surface for meals and gatherings."},
 'Style': {'name': 'Rustic Modern',
  'color_palette': ['warm beige', 'deep forest green', 'charcoal gray'],
  'reasoning': "The solid oak dining table is perfect for a Rustic Modern style, which celebrates natural materials and craftsmanship. The warm tones of oak wood pair beautifully with earthy, muted colors that highlight its natural grain and texture. Charcoal gray and forest green accents can create depth and contrast, while maintaining a connection to natu

In [9]:
# show how each step is accessing structured information from this state
from pipeline.models import Room

Room.from_state_dict(item)

Room(name='Dining Room', reasoning="The Oakwood Dining Table is perfectly designed for a dining room, given its solid oak construction and classic furniture category. Its substantial material and price point suggest it's a high-quality dining table meant to be the centerpiece of a formal or traditional dining space, providing an elegant surface for meals and gatherings.")

---

## Extending the Pipeline

- **Add 2x new dataclasses**: One which is the expected format of the LLM response, and one which represents a singular result, if this step fans out. For example, a `RoomResponse` contains 1-3 `Room`, and we fan out the seed product. The final output (`Room`) must subclass `StepOutput`
- **Add a new step**: subclass `BaseStep`, implement `transform` and `validate`. These should use the dataclasses you just wrote
- **Fan-out in `validate`**: decapsulate the LLM response into individual `StepOutput` objects — one per unit of work for the next step. Return `(source_item, output)` pairs. Validate must discard invalid results (but you're welcome to log them somewhere if you'd like)
- **Typed downstream access**: use `Model.from_state_dict(item)` in `transform` or `validate` to get typed access to any prior step's output.
- **Disk serialization**: `to_state_dict()` writes `model_dump()` dicts, so state is JSON-serializable out of the box. `BaseStep.ingest()` and `BaseStep.end()` just hand off the raw dict. Users may get involved to serialize/deserialize this if they'd like.

In [7]:
state.keys()

dict_keys(['items'])

In [11]:
len(state['items'])
state['items'][0].keys()

dict_keys(['sku', 'Product', 'Room', 'Style'])

In [None]:
from pipeline.models import Room
Room.from_state_dict(state[ITEM_DATA][0])