# 📓 The GenAI Revolution Cookbook

**Title:** Build Durable LLM Workflows with temporal-python-sdk [2025 Guide]

**Description:** Ship production-grade document workflows using Temporal Python SDK signals for human approvals, safe retries, and idempotent activities that survive restarts.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



## Why This Approach Works

Building a durable, human-in-the-loop document approval workflow requires more than just async tasks and retry loops. Traditional approaches fail when processes crash mid-execution, duplicate expensive LLM calls on retries, or lose track of approval state across restarts.

Temporal solves these problems by treating your workflow as durable code that survives failures, retries activities idempotently, and waits indefinitely for human signals without polling. You get automatic retries with exponential backoff, deterministic execution that replays from history, and built-in observability—all without building your own orchestration layer.

In this tutorial, you'll build a production-grade document approval workflow using the Temporal Python SDK. The system will generate drafts via OpenAI, wait for human edits or approval via signals, apply revisions idempotently, and persist the final document—all while handling transient failures and maintaining full auditability.

## How It Works

Here's the end-to-end flow:

1. **Client starts workflow** with a document ID and prompt
2. **Workflow executes draft activity** (idempotent LLM call via OpenAI)
3. **Workflow waits for signals**: human submits edits or approves
4. **On edits**: workflow increments version, executes revision activity (idempotent), loops back to wait
5. **On approval**: workflow executes persist activity (idempotent) and completes
6. **Retries on transient failures**: exponential backoff with configurable max attempts
7. **Durability across restarts**: workflow resumes from last checkpoint if worker crashes

Activities encapsulate side effects (LLM calls, storage) and use atomic file writes keyed by `(doc_id, version)` to ensure at-most-once execution. Workflows orchestrate activities and signals deterministically, never calling external APIs directly. Signals enable human-in-the-loop without polling or timeouts.

## Setup & Installation

This tutorial runs locally and requires a Temporal dev server. You'll need Python 3.8+, an OpenAI API key, and the Temporal CLI.

**Install dependencies:**

In [None]:
!pip install temporalio openai python-dotenv

**Install Temporal CLI** (choose your platform):

- **macOS**: `brew install temporal`
- **Linux**: `curl -sSf https://temporal.download/cli.sh | sh`
- **Windows**: Download from [temporal.io/cli](https://temporal.io/cli)

**Set your OpenAI API key** as an environment variable:

- **macOS/Linux**: `export OPENAI_API_KEY="sk-..."`
- **Windows (PowerShell)**: `$env:OPENAI_API_KEY="sk-..."`
- **Windows (CMD)**: `setx OPENAI_API_KEY "sk-..."`

Alternatively, create a `.env` file with `OPENAI_API_KEY=sk-...` in your project directory.

## Step-by-Step Implementation

### Step 1: Define Shared Models

Create a custom exception for non-retryable validation errors. This tells Temporal to fail fast on bad inputs instead of retrying indefinitely.

In [None]:
%%writefile models.py
# models.py
# Purpose: Define shared exceptions and models for workflow activities.

class ValidationError(Exception):
    """
    Raised when input is invalid and should not be retried by Temporal activities.

    Args:
        message (str): Description of the validation error.
    """
    pass

### Step 2: Build Idempotent Storage Helpers

Activities must be idempotent to prevent duplicate side effects on retries. These helpers use atomic file writes keyed by `(doc_id, version)` to ensure each LLM call happens at most once.

In [None]:
%%writefile storage_utils.py
# storage_utils.py
# Purpose: Provide atomic, idempotent JSON storage helpers for activities.

import os
import json
import tempfile
from pathlib import Path
from typing import Any, Dict, Optional

BASE = Path("storage")
BASE.mkdir(exist_ok=True)

def key_path(*parts: str) -> Path:
    """
    Generate a safe, unique file path for a given key.

    Args:
        *parts (str): Components to build the key.

    Returns:
        Path: Path to the JSON file for this key.
    """
    safe = "_".join(parts)
    return BASE / f"{safe}.json"

def atomic_write_json(path: Path, data: Dict[str, Any]) -> None:
    """
    Atomically write JSON data to disk to ensure at-most-once side effects.

    Args:
        path (Path): Target file path.
        data (Dict[str, Any]): Data to write.

    Raises:
        OSError: If writing fails.
    """
    path.parent.mkdir(parents=True, exist_ok=True)
    with tempfile.NamedTemporaryFile("w", delete=False, dir=str(path.parent)) as tmp:
        json.dump(data, tmp, ensure_ascii=False, indent=2)
        tmp.flush()
        os.fsync(tmp.fileno())
        tmp_name = tmp.name
    os.replace(tmp_name, path)

def read_json_if_exists(path: Path) -> Optional[Dict[str, Any]]:
    """
    Read JSON data from a file if it exists.

    Args:
        path (Path): File path.

    Returns:
        Optional[Dict[str, Any]]: Parsed JSON data or None if file does not exist.
    """
    if path.exists():
        with open(path, "r") as f:
            return json.load(f)
    return None

### Step 3: Implement Idempotent Activities

Activities encapsulate side effects (LLM calls, storage) to keep workflows deterministic. Each activity checks for existing output before calling OpenAI, ensuring idempotency across retries.

In [None]:
%%writefile activities.py
# activities.py
# Purpose: Define idempotent LLM and storage activities for the workflow.

import os
from typing import Dict, Optional
from temporalio import activity
from models import ValidationError
from storage_utils import key_path, read_json_if_exists, atomic_write_json
from openai import OpenAI

_openai_client = None

def get_openai():
    """
    Lazily initialize and return a singleton OpenAI client.

    Returns:
        OpenAI: OpenAI client instance.
    """
    global _openai_client
    if _openai_client is None:
        _openai_client = OpenAI()
    return _openai_client

def _idempotent_get(path_key: str):
    """
    Helper to check for existing output by key.

    Args:
        path_key (str): Unique key for the output.

    Returns:
        Tuple[Path, Optional[Dict]]: Path and existing data if present.
    """
    path = key_path(path_key)
    return path, read_json_if_exists(path)

@activity.defn
async def generate_draft(doc_id: str, prompt: str, version: int, model: Optional[str] = None) -> Dict:
    """
    Idempotently generate a draft for (doc_id, version).
    If a draft already exists, return it instead of re-calling the LLM.

    Args:
        doc_id (str): Document identifier.
        prompt (str): Prompt for the LLM.
        version (int): Version number.
        model (Optional[str]): OpenAI model name.

    Returns:
        Dict: Draft data.

    Raises:
        ValidationError: If prompt is invalid.
        RuntimeError: For transient failures (for retry testing).
    """
    if os.getenv("FAIL_FIRST", "0") == "1":
        os.environ["FAIL_FIRST"] = "0"
        raise RuntimeError("Injected transient failure")

    model = model or os.getenv("OPENAI_MODEL", "gpt-4o-mini")
    path, existing = _idempotent_get(f"draft_{doc_id}_{version}")
    if existing:
        return existing

    if not prompt or len(prompt) < 10:
        raise ValidationError("Prompt too short for drafting")

    client = get_openai()
    resp = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Draft a clear, concise document based on:\n{prompt}\nTone: professional."}],
    )
    text = resp.choices[0].message.content

    result = {
        "doc_id": doc_id,
        "version": version,
        "draft": text,
        "source": "llm",
    }
    atomic_write_json(path, result)
    return result

@activity.defn
async def revise_draft(doc_id: str, base_text: str, edits: str, version: int, model: Optional[str] = None) -> Dict:
    """
    Idempotently create a revision for (doc_id, version) from base_text + edits.

    Args:
        doc_id (str): Document identifier.
        base_text (str): Current draft text.
        edits (str): Human edits to apply.
        version (int): New version number.
        model (Optional[str]): OpenAI model name.

    Returns:
        Dict: Revised draft data.

    Raises:
        ValidationError: If edits are empty.
    """
    model = model or os.getenv("OPENAI_MODEL", "gpt-4o-mini")
    path, existing = _idempotent_get(f"revision_{doc_id}_{version}")
    if existing:
        return existing

    if not edits or len(edits.strip()) == 0:
        raise ValidationError("No edits provided")

    client = get_openai()
    instructions = (
        "Apply the user's edits faithfully. Keep structure, improve clarity, and "
        "do not invent facts. Return the full revised document."
    )
    resp = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"{instructions}\n\n--- Current Draft ---\n{base_text}\n\n--- Edits ---\n{edits}"}],
    )
    text = resp.choices[0].message.content

    result = {
        "doc_id": doc_id,
        "version": version,
        "draft": text,
        "source": "llm_revision",
    }
    atomic_write_json(path, result)
    return result

@activity.defn
async def persist_final(doc_id: str, content: str, version: int) -> Dict:
    """
    Idempotently persist the final approved document.

    Args:
        doc_id (str): Document identifier.
        content (str): Final document content.
        version (int): Version number.

    Returns:
        Dict: Final persisted document data.
    """
    path, existing = _idempotent_get(f"final_{doc_id}_{version}")
    if existing:
        return existing

    result = {"doc_id": doc_id, "version": version, "content": content, "status": "final"}
    atomic_write_json(path, result)
    return result

### Step 4: Define the Workflow with Signals and Queries

Workflows orchestrate activities and wait for signals without calling external APIs directly. This keeps execution deterministic and replayable. Signals enable human-in-the-loop without polling.

In [None]:
%%writefile workflow.py
# workflow.py
# Purpose: Define the deterministic document approval workflow with signals and queries.

from datetime import timedelta
from typing import Optional, Dict
from temporalio import workflow
from temporalio.common import RetryPolicy
from models import ValidationError

with workflow.unsafe.imports_passed_through():
    from activities import generate_draft, revise_draft, persist_final

@workflow.defn
class DocumentWorkflow:
    """
    Durable, human-in-the-loop document approval workflow.

    Signals:
        submit_edit(content): Submit human edits for revision.
        approve(): Approve the current draft.

    Queries:
        status(): Get current workflow status.
    """

    def __init__(self) -> None:
        self.doc_id: Optional[str] = None
        self.prompt: Optional[str] = None
        self.current: Optional[Dict] = None
        self.version: int = 0
        self._approved: bool = False
        self._pending_edits: Optional[str] = None

    @workflow.signal
    def submit_edit(self, content: str) -> None:
        """
        Signal: Submit human edits for the current draft.

        Args:
            content (str): Edits to apply.
        """
        self._pending_edits = content

    @workflow.signal
    def approve(self) -> None:
        """
        Signal: Approve the current draft.
        """
        self._approved = True

    @workflow.query
    def status(self) -> Dict:
        """
        Query: Get current workflow status.

        Returns:
            Dict: Status fields for observability.
        """
        return {
            "doc_id": self.doc_id,
            "version": self.version,
            "approved": self._approved,
            "has_pending_edits": self._pending_edits is not None,
            "current_excerpt": (self.current["draft"][:120] + "...") if self.current else None,
        }

    @workflow.run
    async def run(self, doc_id: str, prompt: str, model: Optional[str] = None) -> Dict:
        """
        Main workflow logic: orchestrates draft, revision, and approval.

        Args:
            doc_id (str): Document identifier.
            prompt (str): Initial prompt for LLM.
            model (Optional[str]): OpenAI model name.

        Returns:
            Dict: Final persisted document data.
        """
        self.doc_id = doc_id
        self.prompt = prompt
        self.version = 1

        ao = workflow.ActivityOptions(
            start_to_close_timeout=timedelta(seconds=60),
            schedule_to_close_timeout=timedelta(minutes=5),
            retry_policy=RetryPolicy(
                initial_interval=timedelta(seconds=2),
                backoff_coefficient=2.0,
                maximum_interval=timedelta(seconds=30),
                maximum_attempts=6,
                non_retryable_error_types=[ValidationError.__name__],
            ),
        )

        self.current = await workflow.execute_activity(
            generate_draft, self.doc_id, self.prompt, self.version, model,
            **ao.as_dict()
        )

        while True:
            await workflow.wait_condition(
                lambda: self._approved or self._pending_edits is not None
            )
            if self._approved:
                break

            edits = self._pending_edits
            self._pending_edits = None
            self.version += 1
            self.current = await workflow.execute_activity(
                revise_draft,
                self.doc_id,
                self.current["draft"],
                edits,
                self.version,
                model,
                **ao.as_dict()
            )

        final = await workflow.execute_activity(
            persist_final, self.doc_id, self.current["draft"], self.version,
            **ao.as_dict()
        )
        return final

### Step 5: Create the Worker

The worker polls the task queue to execute workflows and activities reliably. It connects to the Temporal server and registers your workflow and activities.

In [None]:
%%writefile worker.py
# worker.py
# Purpose: Register and run the Temporal worker for workflows and activities.

import asyncio
import os
from temporalio.client import Client
from temporalio.worker import Worker
from workflow import DocumentWorkflow
from activities import generate_draft, revise_draft, persist_final

TASK_QUEUE = "doc-approval-q"

async def main():
    """
    Start the Temporal worker to process workflow and activity tasks.

    Raises:
        Exception: If connection or worker startup fails.
    """
    address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
    client = await Client.connect(address)
    worker = Worker(
        client,
        task_queue=TASK_QUEUE,
        workflows=[DocumentWorkflow],
        activities=[generate_draft, revise_draft, persist_final],
    )
    print("Worker started on task queue:", TASK_QUEUE)
    await worker.run()

if __name__ == "__main__":
    asyncio.run(main())

### Step 6: Build the Client

The client starts workflows, sends signals (for edits and approval), queries status, and waits for results. This emulates human interaction with the workflow.

In [None]:
%%writefile client.py
# client.py
# Purpose: Start, signal, and query the document approval workflow.

import asyncio
import os
import time
from temporalio.client import Client
from workflow import DocumentWorkflow

TASK_QUEUE = "doc-approval-q"

async def main():
    """
    Start a new workflow, submit edits, approve, and print results.

    Raises:
        Exception: If workflow or client operations fail.
    """
    address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
    client = await Client.connect(address)

    workflow_id = f"doc-approval-{int(time.time())}"
    handle = await client.start_workflow(
        DocumentWorkflow.run,
        id=workflow_id,
        task_queue=TASK_QUEUE,
        args=["doc-123", "Draft a two-paragraph product summary for ACME TurboWidget."],
    )
    print("Started workflow:", workflow_id)

    status = await handle.query(DocumentWorkflow.status)
    print("Initial status:", status)

    await handle.signal(DocumentWorkflow.submit_edit, "Please emphasize safety and warranty; remove pricing details.")

    status = await handle.query(DocumentWorkflow.status)
    print("After edits:", status)

    await handle.signal(DocumentWorkflow.approve)

    result = await handle.result()
    print("Final result:", result)

if __name__ == "__main__":
    asyncio.run(main())

## Run and Validate

**Start the Temporal dev server** in a terminal:

In [None]:
temporal server start-dev

**Start the worker** in a second terminal:

In [None]:
python worker.py

**Run the client** in a third terminal:

In [None]:
python client.py

You'll see output showing the workflow ID, initial status, status after edits, and the final persisted document. Check the `storage/` directory for JSON files keyed by `(doc_id, version)` to verify idempotency.

**Test retry behavior** by setting `FAIL_FIRST=1` before running the client:

In [None]:
export FAIL_FIRST=1
python client.py

The first `generate_draft` call will fail with a transient error, then Temporal will retry automatically with exponential backoff. The workflow completes successfully on the second attempt, demonstrating durable retries.

**Test non-retryable errors** by modifying the prompt in `client.py` to be too short (e.g., `"Hi"`). The workflow will fail immediately with a `ValidationError` and not retry, showing how to short-circuit bad inputs.

**Inspect workflow history** in the Temporal Web UI at `http://localhost:8233`. Search for your workflow ID to see the full event log, including activity executions, signals, and retries.

## Conclusion

You've built a production-grade, human-in-the-loop document approval workflow using Temporal and OpenAI. The system generates drafts, waits for human signals (edits or approval), applies revisions idempotently, and persists the final document—all while handling transient failures and maintaining full auditability.

**Key takeaways:**

- **Activities encapsulate side effects** (LLM calls, storage) to keep workflows deterministic
- **Idempotency via storage keys** prevents duplicate LLM calls on retries
- **Signals enable human-in-the-loop** without polling or timeouts
- **RetryPolicy uses exponential backoff** to tame transient failures and bounds attempts with `maximum_attempts`
- **ValidationError short-circuits bad inputs** to avoid wasting retries

**Next steps:**

- **Add multi-stage approvals**: Extend the workflow to require multiple approvers (e.g., editor, legal, exec) with separate signals and state tracking
- **Integrate Temporal Cloud**: Deploy to Temporal Cloud for managed infrastructure, encryption at rest, and global namespaces
- **Add metrics and alerts**: Instrument activities with custom metrics (e.g., LLM latency, token usage) and set up alerts for retry exhaustion or validation failures
- **Implement versioning**: Use Temporal's versioning API to safely deploy workflow changes without breaking in-flight executions
- **Chunk large documents**: Split long documents into paragraphs, process in parallel activities, and merge results deterministically
- **Enhance visibility**: Add custom search attributes (e.g., `doc_id`, `status`) to enable filtering and dashboards in the Temporal Web UI
- **Secure API keys**: Use Temporal's data converter API to encrypt sensitive payloads (e.g., prompts, drafts) in workflow history