# Adding Durability with Temporal

You've just built a research application that generates PDF reports. It works perfectly—until it doesn't.

Imagine this: Your application conducts expensive research through an LLM call (costing time and money), but then **crashes** during PDF generation due to a network outage. When you restart, everything is lost. You're back to the beginning, paying for the same LLM call again, making your users wait, and burning through your API budget.

As these workflows grow more complex—chaining multiple LLM calls, database queries, external APIs—the problem compounds. Every failure means starting over completely.

In this section, we'll solve this problem by making your application durable. You'll learn how to build GenAI applications that survive failures, recover automatically, and never lose progress.

### Challenges of GenAI Applications

* Networks can be flakey
* LLMs are often rate limited
* Tool resources (APIs and databases) go down
* LLMs are inherently non-deterministic
* How do we scale these applications?
* What happens when they take a long time to finish?
…
What else?

### These Aren't New Problems

The challenges you just identified? They're the same problems we've been solving in distributed systems for decades:

**Your Research Application in Production Reality:**
* **LLM API call** - External service that can timeout, rate limit, or be down.
* **PDF generation** - File system operation that can fail due to disk space
* **User input/output** - Network operations that can be interrupted



### GenAI Applications are Distributed Systems!

**This is a distributed system!** Your "simple" application is actually:
* Multiple network calls to external services
* File system operations
* State that needs to persist across failures
* Coordination between different steps

**The challenge:** Traditional distributed systems tools weren't designed for AI workflows. They don't understand expensive LLM calls, context windows, or long-term state management.

**The good news:** You can use a platform that guarantees the _reliable execution_ of your code.

### What Normal Execution Gives Us

* Every failure means restarting from scratch
* Expensive LLM calls are repeated unnecessarily
* User experience becomes frustrating and unreliable
* No way to resume from where you left off

### What Developers Actually Want

* "Just fix the disk issue and generate the PDF from the research you already have."
* "Don't make me pay for the same LLM call twice!"
* "Don't lose my work because of a simple file system error!"

### Your Report Generation Application Needs Durability

Recall your research application from Notebook 1? Here's what happens in production:

**Scenario:** User asks for research on "sustainable energy trends":

1. LLM call succeeds - generates comprehensive research content ($2.50 in API costs)
2. PDF generation fails - disk full, permission error, or process crash
3. User has to start over completely - losing expensive work and time

We need a way to make our AI applications resilient to these failures.

### Introducing Temporal

- Technology and open source project that delivers resilience for distributed systems in a novel way.
- Supports a programming model that allows developers to code the **happy path**, while the platform provides services that compensate for a wide range of distributed system failures.
- Platform comes in the form of a service + SDKs
- SDK is available for Go, Java, Python, PHP, Typescript, .Net, Ruby

### Let's Make Your GenAI Application Durable

We're about to transform your simple research application into a durable one. Here's what changes:

* Your tools will become crash-proof
* Automatic retries and recovery
* State persistence

This results in a process such as:
LLM Decision → Tool A → Result X (Saved in history, then on replay, same result X will result in the same next decision) → Next Decision

### What stays the same

* Your core logic (LLM call → PDF generation)
* Your inputs and outputs
* Your business requirements

### Package Our Inputs & Outputs for Ease of Management

For ease of use, evolution of parameters, and type checking, Temporal recommends passing and returning a single object from functions. `dataclass` is the recommended structure here, but anything serializable will work.

_Read more about inputs and outputs in [this chapter](https://temporal.talentlms.com/unit/view/id:2822) of our free Temporal 102 course._

In [None]:
# TODO: Run this code block to load it into the program
from dataclasses import dataclass

@dataclass
class LLMCallInput:
  prompt: str

@dataclass
class PDFGenerationInput:
  content: str
  filename: str = "research_pdf.pdf"

### What is an Activity?

* Functions that are making external calls are “wrapped” as activities
* An Activity is a function that is prone to failure and/or non-deterministic.
* Temporal requires all non-deterministic code be run in an Activity

Examples:
  - External API calls - LLM requests, web scraping, database queries
  - File system operations - Reading documents, writing reports, managing storage
  - Network operations - HTTP requests, email sending, data transfers
  - Resource-intensive computations - Image processing, data analysis, model inference

### What Activities Give You

* [**Automatic retries**](https://docs.temporal.io/develop/python/failure-detection#activity-retries) when external code fails
* [**Timeout handling**](https://docs.temporal.io/develop/python/failure-detection#activity-timeouts) for slow operations and detecting failures
* **Detailed visibility** of execution, including inputs/outputs for debugging
* **Automatic checkpoints** - if your workflow crashes, Activities aren't re-executed. Instead, your Workflow continues from the last known good state

### Tasks/Tools become Activities

- To turn a function/method into an Activity, add the `@activity.defn` decorator.
- Package activity arguments into a data structure

### As an Activity, Your LLM Call is Now:
* Protected against API timeouts
* Automatically retried with backoff
* Observable for debugging

### As an Activity, Your PDF Generation is Now:
* Protected against file system errors
* Automatically retried if temporary failures
* Tracked for completion verification

### Let's Create Activities.

But first, let's set up our notebook. Run the following code blocks to install various packages and tools necessary to run this notebook.

In [None]:
# We'll first install the necessary packages for this workshop.

%pip install --quiet temporalio litellm reportlab python-dotenv

### Create a `.env` File

Next you'll create a `.env` file to store your API keys.
In the file browser on the left, create a new file and name it `.env`.
Note that this file doesn't persist across notebooks or sesions.

**Note**: It may disappear as soon as you create it. This is because Google Collab hides hidden files (files that start with a `.`) by default.
To make this file appear, click the icon that is a crossed out eye and hidden files will appear.

Then double click on the `.env` file and add the following line with your API key.

```
LLM_API_KEY = YOUR_API_KEY
LLM_MODEL = "openai/gpt-4o"
```

By default this notebook uses OpenAI's GPT-4o.
If you want to use a different LLM provider, look up the appropriate model name [in the documentation](https://docs.litellm.ai/docs/providers) and change the `LLM_MODEL` field and provide your API key.

In [None]:
# Create .env file
with open(".env", "w") as fh:
  fh.write("LLM_API_KEY = YOUR_API_KEY\nLLM_MODEL = openai/gpt-4o")

# Now open the file and replace YOUR_API_KEY with your API key.

In [None]:
# Load environment variables and configure LLM settings

import os
from dotenv import load_dotenv

load_dotenv(override=True)

# Get LLM_API_KEY environment variable and print it to make sure that your .env file is properly loaded.
LLM_MODEL = os.getenv("LLM_MODEL", "openai/gpt-4o")
LLM_API_KEY = os.getenv("LLM_API_KEY", None)
print("LLM API Key", LLM_API_KEY)

In [None]:
# This allows us to run the Temporal Asyncio event loop within the event loop of Jupyter Notebooks
import nest_asyncio
nest_asyncio.apply()

In [None]:
# Running this will download the Temporal CLI, which we need for this workshop.

!curl -sSf https://temporal.download/cli.sh | sh

In [None]:
# Let's Create Activities
# TODO: Run this code block to load it into the program
from temporalio import activity
from litellm import completion, ModelResponse

@activity.defn
def llm_call(input: LLMCallInput) -> ModelResponse:
    response = completion(
      model=LLM_MODEL,
      api_key=LLM_API_KEY,
      messages=[{ "content": input.prompt,"role": "user"}]
    )
    return response

In [None]:
# Step 1: Make the code an Activity. Look at the cell below for the solution.
# Step 2: Now run the code to load it into the program

from temporalio import activity
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

# TODO Add the Acticity decorator to make this function a Temporal Activity
def create_pdf_activity(input: PDFGenerationInput) -> str:
    print("Creating PDF document...")

    doc = SimpleDocTemplate(input.filename, pagesize=letter)
    styles = getSampleStyleSheet()
    title_style = ParagraphStyle(
        'CustomTitle',
        parent=styles['Heading1'],
        fontSize=24,
        spaceAfter=30,
        alignment=1
    )

    story = []
    title = Paragraph("Research Report", title_style)
    story.append(title)
    story.append(Spacer(1, 20))
    paragraphs = input.content.split('\n\n')
    for para in paragraphs:
        if para.strip():
          p = Paragraph(para.strip(), styles['Normal'])
          story.append(p)
          story.append(Spacer(1, 12))

    doc.build(story)

    print(f"SUCCESS! PDF created: {input.filename}")
    return input.filename

In [None]:
# Optional: Run this cell to load and display the solution
from pathlib import Path
from IPython.display import display, Markdown
import os

notebook_dir = Path(os.getcwd())
solution_file = notebook_dir / "Solutions_02_Adding_Durability_with_Temporal" / "create_pdf_activity_solution.py"

code = solution_file.read_text()

print("Solution loaded:")
display(Markdown(f"```python\n{code}\n```"))

### Activities Are Called from Workflows

- You orchestrate the execution of your Activities from within a [Workflow](https://docs.temporal.io/workflow-definition#workflow-definition).
- Workflows contain the decision-making flow, but Activities perform the actual work.
- Each Activity call is recorded in the workflow history with inputs and outputs
- Workflows can wait for Activity completion, handle failures, and make decisions based on results

### Creating the Workflow

* Activities are orchestrated within a Temporal Workflow.
* Workflows must **not** make API calls, file system calls, or anything non-deterministic. That is what Activities are for.
* Workflows are async, and you define them as a class decorated with the `@workflow.defn` decorator.
* Every Workflow has a **single** entry point, which is an `async` method decorated with `@workflow.run`.

<img src="https://i.postimg.cc/yxW08BHD/activity-workflow-chain.png" width="200"/>

### More Input/Output Packaging

Just like with Activities, Temporal recommends passing a single object to the Workflow for input and returning a single object.

In [None]:
# TODO: Run this code block to load it into the program
from dataclasses import dataclass

@dataclass
class GenerateReportInput:
    prompt: str

@dataclass
class GenerateReportOutput:
    result: str

In [None]:
# Step 1: Add your `@workflow.run` decorator
# Step 2:Notice how the Workflow calls the `llm_call` Activity. 
# Step 3: Follow the pattern to call the `create_pdf_activity`.
# Step 4: Pass in your pdf_generation_input
# Step 5: A Start-to-Close timeout is the maximum amount of time a single Activity Execution can take. We recommend always setting this timeout.
# Set a Start-to-Close timeout of 10 seconds for the `create_pdf_activity`.
# Step 6: Run this code block to load it into the program
from datetime import timedelta
from temporalio import workflow

# sandboxed=False is a Notebook only requirement. You normally don't do this
@workflow.defn(sandboxed=False)
class GenerateReportWorkflow:

    # TODO: Add your `@workflow.run` decorator here
    async def run(self, input: GenerateReportInput) -> GenerateReportOutput:

        llm_call_input = LLMCallInput(prompt=input.prompt)

        research_facts = await workflow.execute_activity(
            llm_call,
            llm_call_input,
            start_to_close_timeout=timedelta(seconds=30), # maximum amount of time a single Activity Execution can take.
        )

        workflow.logger.info("Research complete!")

        pdf_generation_input = PDFGenerationInput(content=research_facts["choices"][0]["message"]["content"])

        pdf_filename = await workflow.execute_activity(
            # TODO: Call the create_pdf_activity here
            # TODO: Pass in your pdf_generation_input
            # TODO: Set the Start-to-Close timeout of 10 seconds here
        )

        return GenerateReportOutput(result=f"Successfully created research report PDF: {pdf_filename}")

In [None]:
# Optional: Run this cell to load and display the solution
from pathlib import Path
from IPython.display import display, Markdown
import os

notebook_dir = Path(os.getcwd())
solution_file = notebook_dir / "Solutions_02_Adding_Durability_with_Temporal" / "generatereportworkflow_solution.py"

code = solution_file.read_text()

print("Solution loaded:")
display(Markdown(f"```python\n{code}\n```"))

### Temporal Workers

* Temporal Workflows are run on [Workers](https://docs.temporal.io/workers)
* Workers wait for tasks to do, such as an Activity or Workflow Task, and execute them.

_Read more about how Workers execute Workflow and Activity tasks in [this chapter](https://temporal.talentlms.com/unit/view/id:2455) of our free Temporal 101 course._

### Running a Worker

* Workers have Workflows and Activities registered to them so the Worker knows what to execute.
* Workers find tasks by listenting on a Task Queue
* Any Worker can pick up a registered Workflow or Activity

The Worker architecture turns your monolith into a modular, event driven application!

<img src="https://i.postimg.cc/dQZZNGPg/worker-architecture.png" width="500"/>

### Running a Temporal Service

* The Temporal Service brings it all together
* The Temporal Service can be run locally, self-hosted, or you can use Temporal Cloud
* The service acts as the supervisor of your Workflows, Activities, and everything else

### Durable Execution

Instead of event-driven architecture, define your workflow as code and let the system track exactly where you are. Write for the happy path—no need to manage queues, events, retries, rollbacks, or state checkpoints. With durable execution, you can just focus on business logic.

<img src="https://i.postimg.cc/635g59w5/durable-execution-example.png" width="500"/>

**Run the Temporal Server Now in this Exercise Environment**:
1. To start the Temporal Server, run `temporal server start-dev` in your terminal.
2. Then in your `Ports` tab on the bottom of this screen, find `8233` and click on the Globe icon to open the Temporal Web UI.

In [None]:
# Here is code for our Worker
# Step 1: Pass in our `create_pdf_activity` into the list to register them with the Worker
# Step 2: Set the task queue that the Worker is polling to be "research"
# Step 3: Run this codeblock to load it into the program
import concurrent.futures
from temporalio.client import Client
from temporalio.worker import Worker

async def run_worker() -> None:
    # Create client connected to server at the given address
    client = await Client.connect("localhost:7233", namespace="default")

    # Run the Worker
    with concurrent.futures.ThreadPoolExecutor(max_workers=100) as activity_executor:
        worker = Worker(
            client,
            task_queue="", # TODO Set the task queue that the Worker is polling to be "research"
            workflows=[GenerateReportWorkflow], # register the Workflow
            activities=[llm_call, __], # TODO Pass in your create_pdf_activity to register it
            activity_executor=activity_executor
        )

        print(f"Starting the worker....")
        await worker.run()

In [None]:
# Optional: Run this cell to load and display the solution
from pathlib import Path
from IPython.display import display, Markdown
import os

notebook_dir = Path(os.getcwd())
solution_file = notebook_dir / "Solutions_02_Adding_Durability_with_Temporal" / "worker_solution.py"

code = solution_file.read_text()

print("Solution loaded:")
display(Markdown(f"```python\n{code}\n```"))

### Starting the Worker

A Workflow can't execute if a Worker isn't running.

In [None]:
# Due to the limitation of Jupyter Notebooks and Google Collab, this is how
# you must start the worker in a Notebook environment
import asyncio

worker = asyncio.create_task(run_worker())

# If you are running this code in a typical Python environment, you can start
# the Worker by just calling `asyncio.run`
# if __name__ == "__main__":
#    asyncio.run(run_worker())

### Executing the Workflow

- Temporal Workflows are executed indirectly
- Request execution from the Temporal Service
- You do this with the [Temporal Client](https://docs.temporal.io/develop/python/temporal-client)

<img src="https://i.postimg.cc/76Mdqfjd/client.png" width="500"/>

In [None]:
# Step 1: Set the Task Queue to be the Task Queue that your Worker is polling
# Step 2: Pass in the Workflow you are running 
# Step 3 Run this code block to load it into the program
from temporalio.client import Client
import uuid

# Create client connected to server at the given address
client = await Client.connect("localhost:7233", namespace="default")

print("Welcome to the Research Report Generator!")
prompt = input("Enter your research topic or question: ").strip()

if not prompt:
    prompt = "Give me 5 fun and fascinating facts about tardigrades. Make them interesting and educational!"
    print(f"No prompt entered. Using default: {prompt}")

# Asynchronous start of a Workflow
handle = await client.start_workflow(
    # TODO Pass in the Workflow you are running
    GenerateReportInput(prompt=prompt),
    id=f"generate-research-report-workflow-{uuid.uuid4()}", # user-defined Workflow identifier, which typically has some business meaning
    task_queue="", # TODO: Set the Task Queue to be the Task Queue that your Worker is polling
)

print(f"Started workflow. Workflow ID: {handle.id}, RunID {handle.result_run_id}")

In [None]:
# Optional: Run this cell to load and display the solution
from pathlib import Path
from IPython.display import display, Markdown
import os

notebook_dir = Path(os.getcwd())
solution_file = notebook_dir / "Solutions_02_Adding_Durability_with_Temporal" / "client_solution.py"

code = solution_file.read_text()

print("Solution loaded:")
display(Markdown(f"```python\n{code}\n```"))

### Getting the Result

The example above uses async execution. You can `await` the handle to get the result.

In [None]:
# Get the result
result = await handle.result()
print(f"Result: {result}")

# To download the report: right click `research_pdf.pdf` in your file explore, then click `Download`.

### Temporal Web UI

- Temporal provides a robust [Web UI](https://docs.temporal.io/web-ui) for managing Workflow Executions
- Can gain insights like responses from Activities, execution time, and failures
- Great for debugging and understanding what's happening during your Workflow Executions.

### Exploring the Web UI

Can you locate the following items on the Web UI?

- The name of the Task Queue
- The name of the two Activities called
- The inputs and outputs of the called Activities
- Input and output of the Workflow Execution

_To See Your Web UI_: In your `Ports` tab on the bottom of this screen, find `8233` and click on the Globe icon.

### Demo (Expand for instructor notes or to run on your own)
<!--
Normal Execution Demo:
1. To demonstrate the power of durable execution, we'll first show the power of running the app with no durable execution. This is the code that we showed in the first notebook.
2. Clone this repository: `https://github.com/temporalio/edu-ai-workshop-agentic-loop`. The instructions will also be in the README.
2. From the `demos/module_one_01_foundations_aiic_loop/app.py` directory, run `app.py` with `python app.py`.
3. When prompted, provide the prompt you want to prompt OpenAI in the command line.
4. Before the process generates a PDF, kill the process.
5. Rerun the application again with `python app.py` and show that the process restarted and you have to have your application start the research again. Emphasize that from a cost perspective, this could be very costly, because you could have to re-run through many tokens to get to where you left off.

Durable Execution Demo:
1. Now show the durable version by switching into the ``demos/module_one_02_adding_durability` directory.
2. Run the Worker with `python worker.py`.
3. Run the Workflow with `python workflow.py`.
4. When prompted, provide the prompt you want to prompt OpenAI in the command line.
5. Before the process generates a PDF, kill the Worker.
6. Rerun the Worker and show that you continue right where you left off.
7. Emphasize that you lost no progress or data. The Workflow will continue by generating the PDF (available in the same directory) and completing the process successfully.
10. Show the Workflow Execution completion in the Web UI.
-->

### What is Durable Execution?

* [Durable execution](https://docs.temporal.io/evaluate/understanding-temporal) is crash-proof execution
* Retries upon failure
* Maintains application state, resuming after a crash at the point of failure
* Can run across a multitude of processes, even on different machines

### Temporal Provides Durable Execution

* Handles state, retries, timeouts, state preservation right out the box
* Open-source MIT licensed
* Code base approach to Workflow design
  - Instead of building custom orchestration systems, you write normal functions.
  - Since it’s a general purpose programming language, there are no abstractions to get in your way. Since AI patterns will continue to evolve, general-purpose programming languages will be as well-suited to implement these new patterns.
* Use your own tools, processes, and libraries
* Support for 7 languages (Python, Go, C#, Java, TypeScript, Ruby, PHP)

### Simulating Failure and Recovery

Let's practice experiencing failure and recovery firsthand. We'll add a new feature to our workflow: generating an executive summary before creating the PDF. 

This will demonstrate:
* How Activities automatically retry on failure
* How Temporal preserves state across Worker restarts
* How you can fix bugs without losing progress

### Step 1: Create a New Activity with an Intentional Error

We'll create a `generate_summary` Activity that:
1. Takes the research content and generates a concise summary
2. Contains an intentional error to simulate a real-world failure
3. Will automatically retry when it fails

Run the code below to add this Activity:

In [None]:
# Run this code to create the Activity with an intentional error
from temporalio import activity
from temporalio.exceptions import ApplicationError

@activity.defn
def generate_summary(input: LLMCallInput) -> ModelResponse:
    """Generate a concise summary of the research content"""
    
    # This simulates a temporary failure - maybe a database is down, 
    # or an API is temporarily unavailable
    raise ApplicationError(
        "Simulated failure: Summary service temporarily unavailable"
    )
    
    # This code would run if we remove the error above
    response = completion(
        model=LLM_MODEL,
        api_key=LLM_API_KEY,
        messages=[{"content": input.prompt, "role": "user"}]
    )
    return response

print("Activity created with intentional error!")

### Step 2: Update the Workflow to Call the Summary Activity

Now we'll modify our Workflow to:
1. Generate research content (existing)
2. **Generate a summary of that research (new!)**
3. Create the PDF with the summary (existing)

Run the code below:

In [None]:
# Updated Workflow with summary generation step
from datetime import timedelta
from temporalio import workflow

@workflow.defn(sandboxed=False)
class GenerateReportWorkflow:

    @workflow.run
    async def run(self, input: GenerateReportInput) -> GenerateReportOutput:

        # Step 1: Generate research content
        llm_call_input = LLMCallInput(prompt=input.prompt)

        research_facts = await workflow.execute_activity(
            llm_call,
            llm_call_input,
            start_to_close_timeout=timedelta(seconds=30),
        )

        workflow.logger.info("Research complete!")

        # Step 2: Generate a summary (NEW - this will fail initially!)
        summary_prompt = f"Provide a 2-3 sentence executive summary of this research: {research_facts['choices'][0]['message']['content']}"
        summary_input = LLMCallInput(prompt=summary_prompt)

        summary_result = await workflow.execute_activity(
            generate_summary,
            summary_input,
            start_to_close_timeout=timedelta(seconds=30),
        )

        workflow.logger.info("Summary generated!")
        
        # Step 3: Create PDF with summary prepended
        full_content = f"EXECUTIVE SUMMARY:\n{summary_result['choices'][0]['message']['content']}\n\n{research_facts['choices'][0]['message']['content']}"
        pdf_generation_input = PDFGenerationInput(content=full_content)

        pdf_filename = await workflow.execute_activity(
            create_pdf_activity,
            pdf_generation_input,
            start_to_close_timeout=timedelta(seconds=10),
        )

        return GenerateReportOutput(result=f"Successfully created research report PDF with summary: {pdf_filename}")

print("Workflow updated!")

### Step 3: Register the New Activity with the Worker

We need to tell the Worker about our new `generate_summary` Activity:

In [None]:
# Updated Worker with the new Activity registered
import concurrent.futures
from temporalio.client import Client
from temporalio.worker import Worker

async def run_worker() -> None:
    client = await Client.connect("localhost:7233", namespace="default")

    with concurrent.futures.ThreadPoolExecutor(max_workers=100) as activity_executor:
        worker = Worker(
            client,
            task_queue="research",
            workflows=[GenerateReportWorkflow],
            activities=[llm_call, create_pdf_activity, generate_summary],  # Added generate_summary
            activity_executor=activity_executor
        )

        print(f"Starting the worker with summary Activity registered....")
        await worker.run()

print("Worker function updated!")

In [None]:
# Kill the old worker and start the new one
import asyncio

x = worker.cancel()

worker = asyncio.create_task(run_worker())
print("New worker started!")

### Step 4: Start a New Workflow Execution

Let's start a new Workflow that will call our failing Activity:

In [None]:
# Start a new Workflow execution
from temporalio.client import Client
import uuid

client = await Client.connect("localhost:7233", namespace="default")

prompt = "Give me 3 interesting facts about dolphins"
print(f"Starting workflow with prompt: {prompt}")

handle = await client.start_workflow(
    GenerateReportWorkflow,
    GenerateReportInput(prompt=prompt),
    id=f"generate-report-with-summary-{uuid.uuid4()}",
    task_queue="research",
)

print(f"Started workflow. Workflow ID: {handle.id}")
print(f"The workflow is now running and will retry the failing Activity automatically!")

### Step 5: Observe Automatic Retries in the Web UI

**Go to your Temporal Web UI now!**

You should see:
1. Your Workflow is **Running** (not Failed!)
2. The `llm_call` Activity completed successfully ✓
3. The `generate_summary` Activity shows a **Pending Activity** with retry attempts

**Click on the Pending Activity to see:**
- The error message. What does it say?
- What is the current retry attempt number?
- What is the countdown until the next retry?

**Key insight:** Notice that the expensive `llm_call` Activity isn't being re-executed! Temporal saved its result and won't waste money calling the LLM again. Only the failing Activity retries.

### Step 6: Fix the Error

Now let's "fix" our simulated failure by removing the error. In a real scenario, this could be:
- A database coming back online
- An API endpoint being fixed
- A network issue being resolved

Fix the code by removing or commenting out the error.

In [None]:
# Step 1: Fix the code by removing or commenting out the error.
# Step 2: Run the code block
from temporalio import activity
from litellm import completion

@activity.defn
def generate_summary(input: LLMCallInput) -> ModelResponse:
    """Generate a concise summary of the research content"""
    
    # This simulates a temporary failure - maybe a database is down, 
    # or an API is temporarily unavailable
    raise ApplicationError(
        "Simulated failure: Summary service temporarily unavailable"
    )

    # Error is now removed - the Activity will work!
    response = completion(
        model=LLM_MODEL,
        api_key=LLM_API_KEY,
        messages=[{"content": input.prompt, "role": "user"}]
    )
    return response

print("Activity fixed! Error removed.")

### Step 7: Restart the Worker with Fixed Code

Now restart the Worker so it picks up the fixed Activity code:

In [None]:
# Kill the old worker and start the new one
import asyncio

x = worker.cancel()
worker = asyncio.create_task(run_worker())
print("New worker started!")

### Step 8: Observe Successful Completion

**Refresh your Web UI and observe:**

1. The `generate_summary` Activity now completes successfully!
2. The `create_pdf_activity` executes and creates the PDF
3. The entire Workflow shows **Completed** status

**What just happened?**
- Your Workflow **preserved all state** through the failure
- The expensive `llm_call` was **never re-executed** (saving you money!)
- When you fixed the bug, Temporal **automatically continued** from where it left off
- No manual intervention needed - just fix the code and restart the Worker

**This is the power of durable execution!** In production, this means:
- API outages don't lose your progress
- You can deploy bug fixes without restarting workflows
- Your users never lose work
- You never pay twice for the same LLM call

### Durable execution - State Preservation

Temporal relies on a [Replay mechanism](https://docs.temporal.io/encyclopedia/event-history/event-history-python) to recover from failure.
As your program progresses, Temporal saves the input and output from function calls to the history.
This allows a failed program to restart right where it left off.
This can also save us a lot of money since we aren't re-burning through tokens!

For example:

User request: "Research sustainable energy trends"
- ✓ Step 1: LLM research call → Output saved to history
- ✓ Step 2: Generate summary → Output saved to history  
- ✗ Step 3: Create PDF → CRASH!

On restart:
- Temporal replays Steps 1 & 2 from history (no actual execution)
- Continues from Step 3 with the same inputs

_Read more about how Replay works in [this chapter](https://temporal.talentlms.com/unit/view/id:2847) of our free Temporal 102 course or read [this page](https://docs.temporal.io/encyclopedia/event-history/event-history-python#How-History-Replay-Provides-Durable-Execution) in our docs._

In [None]:
# Kill any worker to prepare for the exercise.
x = worker.cancel()

if x:
  print("Worker killed")
else:
  print("Worker was not running. Nothing to kill")

--

## Exercise 2 - Adding Durability

* In these exercises you will:
  * Transform your LLM calls and your execution of tools to Activities
  * Use a Temporal Workflow to orchestrate your Activities
  * Observe how Temporal handles your errors
  * Debug your error and observe your Workflow Execution successfully complete
* Go to the **Exercise** Directory and open the **02_Adding_Durability_with_Temporal** Directory
* Open _Practice_ and follow the instructions
* If you get stuck, raise your hand and someone will come by and help. You can also check the `Solution` directory for the answers

### What's Next?

This chapter introduced you to the **concept** of Durable Execution with Temporal. Further your learning with these resources:

### Resources

- Our free [Temporal 102 Course](https://learn.temporal.io/courses/temporal_102/python/) which covers these concepts (Workflows, Activities, Replay, and more) in more detail
- A Temporal [tutorial in the Python SDK](https://learn.temporal.io/getting_started/python/hello_world_in_python/) that showcases how to get started with Temporal
- Our [docs page](https://docs.temporal.io/encyclopedia/event-history/event-history-python#How-History-Replay-Provides-Durable-Execution) describing how Temporal uses Replay to provide durable execution in more detail