# Exponential Backoff and Retry Configuration with Amazon Bedrock in AG2

Author: [Priyanshu Deshmukh](https://github.com/priyansh4320)

This notebook demonstrates how to configure **exponential backoff and retry behavior** for Amazon Bedrock API calls in AG2. Proper retry configuration helps handle transient errors, rate limits, and network issues gracefully.

## What are Retry Configurations?

Retry configurations enable you to:
- **Handle transient errors**: Automatically retry failed requests due to temporary network issues
- **Manage rate limits**: Use exponential backoff to respect API rate limits
- **Improve reliability**: Ensure your applications are resilient to temporary failures
- **Control retry behavior**: Fine-tune how many retries and what strategy to use

## How Bedrock Implements Retries

Bedrock uses boto3's retry configuration system, which supports:

1. **Total Max Attempts**: Maximum number of total attempts (initial + retries)
2. **Max Attempts**: Legacy parameter for maximum retry attempts
3. **Retry Modes**: Different strategies for handling retries
   - `legacy`: Pre-existing retry behavior
   - `standard`: Standardized retry rules (defaults to 3 max attempts)
   - `adaptive`: Retries with additional client-side throttling

## Requirements

- Python >= 3.10
- AG2 installed with bedrock extra: `pip install ag2[bedrock]`
- AWS credentials configured (via environment variables, IAM role, or AWS credentials file)

## Retry Configuration Parameters

### Key Parameters

- **`total_max_attempts`** (int): Maximum number of total attempts (initial + retries)
  - Preferred over `max_attempts`
  - Maps to `AWS_MAX_ATTEMPTS` environment variable
  - Example: `5` means 1 initial attempt + 4 retries = 5 total attempts

- **`max_attempts`** (int): Maximum number of retry attempts (legacy)
  - Example: `2` means 2 retries after initial request
  - `0` means no retries
  - Defaults to 4 if not specified

- **`mode`** (str): Retry strategy mode
  - `"legacy"`: Pre-existing retry behavior
  - `"standard"`: Standardized retry rules (defaults to 3 max attempts)
  - `"adaptive"`: Retries with client-side throttling (best for rate limits)

### Important Notes

- If both `total_max_attempts` and `max_attempts` are provided, `total_max_attempts` takes precedence
- `total_max_attempts` is preferred because it aligns with AWS environment variables
- `adaptive` mode is recommended for handling rate limits and throttling

## Installation

Install required packages if not already installed:

In [None]:
%pip install ag2[bedrock] --upgrade

## Setup: Import Libraries and Configure AWS Credentials

In [None]:
import os

from dotenv import load_dotenv

from autogen import ConversableAgent, LLMConfig

load_dotenv()


print("Libraries imported successfully!")

## Part 1: Basic Retry Configuration

Let's start with a simple configuration using default retry settings:

In [None]:
# Basic configuration with default retry settings
llm_config_default = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "aws_profile_name": os.getenv("AWS_PROFILE"),
        # Default retry: total_max_attempts=5, max_attempts=5, mode="standard"
    },
)

print("Default retry configuration created!")
print("Default settings:")
print("  - total_max_attempts: 5")
print("  - max_attempts: 5")
print("  - mode: standard")

## Part 2: Custom Retry Configuration - Total Max Attempts

Configure the total number of attempts (initial + retries):

In [None]:
# Configuration with custom total_max_attempts
llm_config_custom_attempts = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 10,  # 1 initial + 9 retries = 10 total attempts
        "mode": "standard",
    },
)

print("Custom retry configuration created!")
print("Settings:")
print("  - total_max_attempts: 10 (1 initial + 9 retries)")
print("  - mode: standard")

## Part 3: Retry Modes Comparison

### Mode 1: Legacy Mode

Uses the pre-existing retry behavior:

In [None]:
# Define structured output model for math problem solving
from pydantic import BaseModel


class Step(BaseModel):
    """Represents a single step in solving a math problem."""

    explanation: str  # What operation or reasoning is being performed
    output: str  # The result of this step


class MathReasoning(BaseModel):
    """Complete structured response for a math problem solution."""

    steps: list[Step]  # List of all steps taken
    final_answer: str  # The final answer

    def format(self) -> str:
        """Format the structured output for human-readable display."""
        steps_output = "\n".join(
            f"Step {i + 1}: {step.explanation}\n  Output: {step.output}" for i, step in enumerate(self.steps)
        )
        return f"{steps_output}\n\nFinal Answer: {self.final_answer}"

In [None]:
# Legacy retry mode
llm_config_legacy = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "legacy",  # Pre-existing retry behavior
    },
)

print("Legacy mode configuration created!")

### Mode 2: Standard Mode (Recommended)

Standardized retry rules with default 3 max attempts:

In [None]:
# Standard retry mode (default)
llm_config_standard = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "standard",  # Standardized retry rules
    },
)

print("Standard mode configuration created!")

### Mode 3: Adaptive Mode (Best for Rate Limits)

Retries with additional client-side throttling:

In [None]:
# Adaptive retry mode (best for handling rate limits)
llm_config_adaptive = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 8,
        "mode": "adaptive",  # Retries with client-side throttling
        "response_format": MathReasoning,
    },
)

print("Adaptive mode configuration created!")
print("Adaptive mode is recommended for:")
print("  - Handling rate limits")
print("  - Managing throttling")
print("  - High-throughput scenarios")

## Part 4: Complete Retry Configuration Examples

### Example 1: High-Reliability Configuration

For critical applications that need maximum retry attempts:

In [None]:
# High-reliability configuration
llm_config_high_reliability = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 10,  # More retries for reliability
        "mode": "adaptive",  # Best for handling various error types
    },
)

print("High-reliability configuration created!")

### Example 2: Fast-Fail Configuration

For applications that need quick failure detection:

In [None]:
# Fast-fail configuration
llm_config_fast_fail = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 2,  # Minimal retries for fast failure
        "mode": "standard",
    },
)

print("Fast-fail configuration created!")

### Example 3: Rate-Limit Optimized Configuration

For handling rate limits and throttling:

In [None]:
# Rate-limit optimized configuration
llm_config_rate_limit = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 8,
        "mode": "adaptive",  # Best for rate limit handling
    },
)

print("Rate-limit optimized configuration created!")

## Part 5: Creating Agents with Retry Configuration

Create agents with different retry configurations:

In [None]:
# Agent with adaptive retry mode
agent_adaptive = ConversableAgent(
    name="adaptive_agent",
    llm_config=llm_config_adaptive,
    system_message="You are a helpful assistant.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)

print(f"Agent '{agent_adaptive.name}' created with adaptive retry mode!")

# Agent with high-reliability configuration
agent_reliable = ConversableAgent(
    name="reliable_agent",
    llm_config=llm_config_high_reliability,
    system_message="You are a reliable assistant that handles errors gracefully.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)

print(f"Agent '{agent_reliable.name}' created with high-reliability retry config!")

## Part 6: Testing Retry Behavior

Test how retry configuration handles errors:

In [None]:
# Test with adaptive retry mode
print("=== Testing Adaptive Retry Mode ===")

result = agent_adaptive.run(
    message="What is 2 + 2?",
    max_turns=1,
).process()

## Part 7: Inspecting Retry Configuration

Inspect the actual retry configuration used by the client:

In [None]:
from autogen.oai.bedrock import BedrockClient

# Create a client to inspect retry config
client = BedrockClient(
    aws_region=os.getenv("AWS_REGION", "us-east-1"),
    aws_access_key=os.getenv("AWS_ACCESS_KEY"),
    aws_secret_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    total_max_attempts=7,
    max_attempts=3,
    mode="adaptive",
)

print("Retry Configuration:")
print(f"  - total_max_attempts: {client._total_max_attempts}")
print(f"  - max_attempts: {client._max_attempts}")
print(f"  - mode: {client._mode}")
print(f"  - retry_config dict: {client._retry_config}")

# Note: When both total_max_attempts and max_attempts are provided,
# boto3 Config may normalize the config, preferring total_max_attempts

## Part 8: Environment Variable Configuration

You can also configure retries via environment variables:

In [None]:
# Set environment variables for retry configuration
# Note: These are boto3/botocore environment variables
os.environ["AWS_MAX_ATTEMPTS"] = "10"  # Maps to total_max_attempts

print("Environment variable configured:")
print(f"  AWS_MAX_ATTEMPTS: {os.environ.get('AWS_MAX_ATTEMPTS')}")

# When using environment variables, you don't need to specify
# total_max_attempts in the config_list
llm_config_env = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        # total_max_attempts will be read from AWS_MAX_ATTEMPTS env var
        "mode": "adaptive",
    },
)

print("Configuration using environment variable created!")

## Part 9: Best Practices

### 1. Choose the Right Retry Mode

- **Use `standard`** for most applications (default, well-tested)
- **Use `adaptive`** when dealing with rate limits or high-throughput scenarios
- **Use `legacy`** only for backward compatibility

### 2. Set Appropriate Total Max Attempts

- **Low (2-3)**: For fast-fail scenarios or when errors are likely permanent
- **Medium (5-7)**: For most applications (good balance)
- **High (10+)**: For critical applications or when dealing with unreliable networks

### 3. Prefer `total_max_attempts` over `max_attempts`

- `total_max_attempts` is the preferred parameter
- It aligns with AWS environment variables
- It's more intuitive (total attempts vs retry attempts)

### 4. Combine with Timeout Configuration

In [None]:
# Combine retry config with timeout
llm_config_with_timeout = LLMConfig(
    config_list={
        "api_type": "bedrock",
        "model": "qwen.qwen3-coder-480b-a35b-v1:0",
        "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
        "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
        "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "total_max_attempts": 5,
        "mode": "adaptive",
        "timeout": 60,  # 60 seconds timeout per request
    },
)

print("Configuration with timeout created!")

## Part 10: Error Handling with Retries

Handle different types of errors that retries can help with:

In [None]:
# Agent with comprehensive retry configuration
agent_with_retries = ConversableAgent(
    name="retry_agent",
    llm_config=llm_config_adaptive,
    system_message="You are a helpful assistant.",
    max_consecutive_auto_reply=1,
    human_input_mode="NEVER",
)


def test_with_error_handling():
    """Test agent with error handling."""
    try:
        result = agent_with_retries.run(
            message="Hello, how are you?",
            max_turns=1,
        ).process()
        return result
    except Exception as e:
        print(f"Error after retries: {type(e).__name__}: {e}")
        # The retry mechanism should have already attempted multiple times
        raise


# Test the error handling
print("=== Testing Error Handling ===")
result = test_with_error_handling()
print("Test completed!")

## Part 11: Comparison Table

| Configuration | total_max_attempts | mode | Use Case |
|--------------|-------------------|------|----------|
| Default | 5 | standard | General purpose |
| High Reliability | 10 | adaptive | Critical applications |
| Fast Fail | 2 | standard | Quick failure detection |
| Rate Limit Optimized | 8 | adaptive | High-throughput scenarios |
| Legacy | 5 | legacy | Backward compatibility |

## Part 12: Advanced: Custom Retry Configuration per Agent

Create multiple agents with different retry configurations:

In [None]:
# Agent 1: Fast responses (fewer retries)
fast_agent = ConversableAgent(
    name="fast_agent",
    llm_config=LLMConfig(
        config_list={
            "api_type": "bedrock",
            "model": "qwen.qwen3-coder-480b-a35b-v1:0",
            "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
            "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
            "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
            "total_max_attempts": 3,
            "mode": "standard",
        },
    ),
    system_message="You provide quick responses.",
    max_consecutive_auto_reply=1,
)

# Agent 2: Reliable responses (more retries)
reliable_agent = ConversableAgent(
    name="reliable_agent",
    llm_config=LLMConfig(
        config_list={
            "api_type": "bedrock",
            "model": "qwen.qwen3-coder-480b-a35b-v1:0",
            "aws_region": os.getenv("AWS_REGION", "eu-north-1"),
            "aws_access_key": os.getenv("AWS_ACCESS_KEY"),
            "aws_secret_key": os.getenv("AWS_SECRET_ACCESS_KEY"),
            "total_max_attempts": 10,
            "mode": "adaptive",
        },
    ),
    system_message="You provide reliable responses with retries.",
    max_consecutive_auto_reply=1,
)

print("Multiple agents created with different retry configurations!")

## Summary

In this notebook, we've learned:

1. ✅ How to configure retry behavior for Bedrock API calls
2. ✅ Understanding `total_max_attempts`, `max_attempts`, and `mode` parameters
3. ✅ Different retry modes: `legacy`, `standard`, and `adaptive`
4. ✅ Best practices for choosing retry configurations
5. ✅ How to combine retry config with timeout settings
6. ✅ Error handling strategies with retries
7. ✅ Environment variable configuration options

## Key Takeaways

- **Use `total_max_attempts`** (preferred over `max_attempts`)
- **Use `adaptive` mode** for rate limit handling
- **Use `standard` mode** for general-purpose applications
- **Set appropriate attempt counts** based on your reliability needs
- **Combine with timeout** for better control

## Next Steps

- Experiment with different retry configurations for your use case
- Monitor retry behavior in production
- Adjust retry settings based on error patterns
- Consider using `adaptive` mode for high-throughput scenarios

## References

- [AG2 Documentation](https://docs.ag2.ai)
- [Bedrock Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html)
- [boto3 Retry Configuration](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html)
- [AWS SDK Retry Behavior](https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html)