# Bedrock Inference Profiles
Explore System-Defined vs Application Inference Profiles

In [1]:
import boto3
import json
import time

REGION = 'us-east-1'
bedrock = boto3.client('bedrock', region_name=REGION)
bedrock_runtime = boto3.client('bedrock-runtime', region_name=REGION)

print(f"Region: {REGION}")

Region: us-east-1


## 1. List System-Defined Inference Profiles

In [13]:
# List system inference profiles
response = bedrock.list_inference_profiles()

print("System-Defined Inference Profiles:\n")
for profile in response['inferenceProfileSummaries']:
    if profile['type'] == 'SYSTEM_DEFINED':
        print(f"Name: {profile['inferenceProfileName']}")
        print(f"ARN: {profile['inferenceProfileArn']}")
        print(f"ID: {profile['inferenceProfileId']}")
        print(f"Type: {profile['type']}")
        print(f"Status: {profile['status']}")
        print(f"Models: {profile.get('models', [])}")
        print("-" * 80)

System-Defined Inference Profiles:

Name: US Anthropic Claude 3 Sonnet
ARN: arn:aws:bedrock:us-east-1:058264544288:inference-profile/us.anthropic.claude-3-sonnet-20240229-v1:0
ID: us.anthropic.claude-3-sonnet-20240229-v1:0
Type: SYSTEM_DEFINED
Status: ACTIVE
Models: [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'}, {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'}]
--------------------------------------------------------------------------------
Name: US Anthropic Claude 3 Opus
ARN: arn:aws:bedrock:us-east-1:058264544288:inference-profile/us.anthropic.claude-3-opus-20240229-v1:0
ID: us.anthropic.claude-3-opus-20240229-v1:0
Type: SYSTEM_DEFINED
Status: ACTIVE
Models: [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-opus-20240229-v1:0'}, {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-opus-20240229-v1:0'}]
-------------------------

## 2. Understanding System-Defined Profiles

**System-Defined Profiles** are managed by AWS and provide:
- **Cross-region routing** - Automatically route to available regions
- **High availability** - Failover to other regions if one is unavailable
- **Load balancing** - Distribute traffic across regions
- **No configuration** - Ready to use immediately

**Use Cases:**
- Production applications requiring high availability
- Global applications with users in multiple regions
- Applications sensitive to throttling

In [12]:
# Get details of a system profile
system_profiles = [p for p in response['inferenceProfileSummaries'] if p['type'] == 'SYSTEM_DEFINED']

if system_profiles:
    profile_id = system_profiles[10]['inferenceProfileId']
    
    detail = bedrock.get_inference_profile(inferenceProfileIdentifier=profile_id)
    
    print("System Profile Details:\n")
    print(json.dumps(detail, indent=2, default=str))

System Profile Details:

{
  "ResponseMetadata": {
    "RequestId": "8d5dd7af-b074-408b-b46c-1af1fee5a747",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Mon, 08 Dec 2025 08:09:38 GMT",
      "content-type": "application/json",
      "content-length": "739",
      "connection": "keep-alive",
      "x-amzn-requestid": "8d5dd7af-b074-408b-b46c-1af1fee5a747"
    },
    "RetryAttempts": 0
  },
  "inferenceProfileName": "US Meta Llama 3.1 70B Instruct",
  "description": "Routes requests to Meta Llama 3.1 70B Instruct in us-east-1, us-east-2 and us-west-2.",
  "createdAt": "2024-11-01 00:00:00+00:00",
  "updatedAt": "2024-11-04 18:04:24.303028+00:00",
  "inferenceProfileArn": "arn:aws:bedrock:us-east-1:058264544288:inference-profile/us.meta.llama3-1-70b-instruct-v1:0",
  "models": [
    {
      "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/meta.llama3-1-70b-instruct-v1:0"
    },
    {
      "modelArn": "arn:aws:bedrock:us-east-2::foundation-model/meta.llama3-1-70b

## 3. Test System-Defined Profile

In [None]:
# Use system profile for inference
SYSTEM_PROFILE_ID = 'us.amazon.nova-2-lite-v1:0'  # Cross-region profile

response = bedrock_runtime.converse(
    modelId=SYSTEM_PROFILE_ID,
    messages=[{
        'role': 'user',
        'content': [{'text': 'Explain inference profiles in one sentence'}]
    }]
)

print("System Profile Response:")
print(response['output']['message']['content'][0]['text'])
print(f"\nStop Reason: {response['stopReason']}")
print(f"Usage: {response['usage']}")

System Profile Response:
An **inference profile** is a set of predefined configurations that define how a model processes input data for inference, including details such as batch size, sequence length, precision (e.g., FP32, FP16, INT8), threading options, and hardware-specific settings, enabling optimized and consistent model deployment. 

### One-Sentence Summary:
An inference profile specifies the configuration and optimization settings (like precision, batch size, and hardware utilization) used to run a machine learning model efficiently during prediction.

Stop Reason: end_turn
Usage: {'inputTokens': 52, 'outputTokens': 103, 'totalTokens': 155}


## 4. Create Application Inference Profile

In [None]:
# Create application inference profile
APP_PROFILE_NAME = 'my-app-profile'

try:
    response = bedrock.create_inference_profile(
        inferenceProfileName=APP_PROFILE_NAME,
        description='Custom profile for my application',
        modelSource={
            'copyFrom': 'arn:aws:bedrock:us-east-1:058264544288:inference-profile/us.amazon.nova-2-lite-v1:0'
        },
        tags={

        }
    )
    
    APP_PROFILE_ARN = response['inferenceProfileArn']
    print(f"✓ Created application profile: {APP_PROFILE_ARN}")
except bedrock.exceptions.ValidationException as e:
    print(f"Note: {e}")
    # List existing app profiles
    profiles = bedrock.list_inference_profiles()['inferenceProfileSummaries']
    app_profiles = [p for p in profiles if p['type'] == 'APPLICATION']
    if app_profiles:
        APP_PROFILE_ARN = app_profiles[0]['inferenceProfileArn']
        print(f"Using existing profile: {APP_PROFILE_ARN}")

✓ Created application profile: arn:aws:bedrock:us-east-1:058264544288:application-inference-profile/yaec80410apl


## 5. Understanding Application Inference Profiles

**Application Inference Profiles** provide:
- **Cost tracking** - Track usage per application/team
- **Usage monitoring** - Separate metrics for different apps
- **Access control** - IAM policies per profile
- **Quota management** - Separate rate limits

**Use Cases:**
- Multi-tenant applications
- Cost allocation across teams
- Separate dev/staging/prod environments
- Chargeback to business units

In [19]:
# List application profiles
response = bedrock.list_inference_profiles()

print("Application Inference Profiles:\n")
for profile in response['inferenceProfileSummaries']:
    if profile['type'] == 'APPLICATION':
        print(f"Name: {profile['inferenceProfileName']}")
        print(f"ARN: {profile['inferenceProfileArn']}")
        print(f"Type: {profile['type']}")
        print(f"Status: {profile['status']}")
        print("-" * 80)

Application Inference Profiles:



## 6. Compare System vs Application Profiles

In [20]:
comparison = {
    'Feature': [
        'Cross-Region Routing',
        'High Availability',
        'Cost Tracking',
        'Usage Monitoring',
        'Custom IAM Policies',
        'Setup Required',
        'Management',
        'Best For'
    ],
    'System-Defined': [
        '✓ Yes',
        '✓ Yes',
        '✗ No',
        '✗ No',
        '✗ No',
        '✓ None',
        'AWS Managed',
        'High availability, global apps'
    ],
    'Application': [
        '✓ Inherits from base',
        '✓ Inherits from base',
        '✓ Yes',
        '✓ Yes',
        '✓ Yes',
        '✗ Must create',
        'User Managed',
        'Cost tracking, multi-tenant'
    ]
}

import pandas as pd
df = pd.DataFrame(comparison)
print(df.to_string(index=False))

             Feature                 System-Defined                 Application
Cross-Region Routing                          ✓ Yes        ✓ Inherits from base
   High Availability                          ✓ Yes        ✓ Inherits from base
       Cost Tracking                           ✗ No                       ✓ Yes
    Usage Monitoring                           ✗ No                       ✓ Yes
 Custom IAM Policies                           ✗ No                       ✓ Yes
      Setup Required                         ✓ None               ✗ Must create
          Management                    AWS Managed                User Managed
            Best For High availability, global apps Cost tracking, multi-tenant


## 7. Test Application Profile

In [21]:
# Use application profile (if created)
try:
    response = bedrock_runtime.converse(
        modelId=APP_PROFILE_ARN,
        messages=[{
            'role': 'user',
            'content': [{'text': 'What are the benefits of application inference profiles?'}]
        }]
    )
    
    print("Application Profile Response:")
    print(response['output']['message']['content'][0]['text'])
    print(f"\nUsage tracked under: {APP_PROFILE_ARN}")
except Exception as e:
    print(f"Note: {e}")
    print("Application profiles may not be available in all regions")

Application Profile Response:
### **Benefits of Application Inference Profiles**

**Application Inference Profiles** are configuration templates or sets of rules that define how an application should be analyzed, monitored, or optimized during the **inference phase**—the stage where a trained machine learning (ML) model is deployed to make predictions on new, real-world data. These profiles are commonly used in **MLOps**, **model serving**, **cloud-based AI services**, and **edge computing** environments.

Using inference profiles provides numerous benefits across various domains, including performance, cost-efficiency, scalability, and maintainability. Below are the **key benefits**:

---

## ✅ **1. Optimized Performance**
Inference profiles allow you to **tailor the execution environment** to the specific needs of an application, such as:
- **Latency requirements** (e.g., real-time vs. batch processing)
- **Throughput needs**
- **Hardware acceleration** (e.g., GPU, TPU, NPU, or CPU-o

## 8. Cost Tracking Example

In [22]:
# Example: Multiple application profiles for different teams
team_profiles = [
    {'name': 'team-a-profile', 'description': 'Team A - Customer Service'},
    {'name': 'team-b-profile', 'description': 'Team B - Analytics'},
    {'name': 'team-c-profile', 'description': 'Team C - Marketing'}
]

print("Multi-Team Setup Example:\n")
for team in team_profiles:
    print(f"Profile: {team['name']}")
    print(f"Purpose: {team['description']}")
    print(f"Benefit: Separate cost tracking and usage metrics")
    print("-" * 80)

Multi-Team Setup Example:

Profile: team-a-profile
Purpose: Team A - Customer Service
Benefit: Separate cost tracking and usage metrics
--------------------------------------------------------------------------------
Profile: team-b-profile
Purpose: Team B - Analytics
Benefit: Separate cost tracking and usage metrics
--------------------------------------------------------------------------------
Profile: team-c-profile
Purpose: Team C - Marketing
Benefit: Separate cost tracking and usage metrics
--------------------------------------------------------------------------------


## 9. IAM Policy Example for Application Profile

In [None]:
# Example IAM policy for application profile
iam_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                f"arn:aws:bedrock:{REGION}:*:inference-profile/my-app-profile"
            ]
        },
        {
            "Effect": "Deny",
            "Action": "bedrock:InvokeModel",
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "bedrock:InferenceProfileArn": f"arn:aws:bedrock:{REGION}:*:inference-profile/my-app-profile"
                }
            }
        }
    ]
}

print("IAM Policy for Application Profile:\n")
print(json.dumps(iam_policy, indent=2))
print("\nThis policy:")
print("- Allows access only to specific application profile")
print("- Denies access to other models/profiles")
print("- Enables cost tracking per team/application")

## 10. Performance Comparison

In [None]:
# Compare latency (system profile has cross-region routing)
prompt = "What is cloud computing?"

# Test system profile
start = time.time()
response = bedrock_runtime.converse(
    modelId='us.anthropic.claude-3-5-sonnet-20241022-v2:0',
    messages=[{'role': 'user', 'content': [{'text': prompt}]}]
)
system_time = time.time() - start

print(f"System Profile Latency: {system_time:.2f}s")
print(f"Tokens: {response['usage']['totalTokens']}")
print(f"\nNote: System profiles may route to different regions for optimal performance")

## 11. Decision Guide

In [None]:
decision_guide = {
    'Scenario': [
        'Single application, need high availability',
        'Multiple teams, need cost tracking',
        'Global users, need low latency',
        'Dev/Staging/Prod separation',
        'Multi-tenant SaaS application',
        'Simple prototype/testing',
        'Compliance: data residency required',
        'Need separate quotas per team'
    ],
    'Recommendation': [
        'System-Defined',
        'Application',
        'System-Defined',
        'Application (one per env)',
        'Application (one per tenant)',
        'System-Defined',
        'Direct model (not profile)',
        'Application'
    ],
    'Reason': [
        'Cross-region failover',
        'Cost allocation',
        'Automatic routing',
        'Usage tracking',
        'Per-tenant metrics',
        'No setup needed',
        'Region control',
        'Separate limits'
    ]
}

df = pd.DataFrame(decision_guide)
print("\nDecision Guide:\n")
print(df.to_string(index=False))

## Summary

### System-Defined Inference Profiles:

**Pros:**
- ✓ Cross-region routing for high availability
- ✓ Automatic failover
- ✓ No setup required
- ✓ AWS managed

**Cons:**
- ✗ No cost tracking per application
- ✗ No custom IAM policies
- ✗ Shared usage metrics

**Best For:**
- Production apps needing high availability
- Global applications
- Simple deployments

### Application Inference Profiles:

**Pros:**
- ✓ Cost tracking per application/team
- ✓ Separate usage metrics
- ✓ Custom IAM policies
- ✓ Quota management

**Cons:**
- ✗ Requires setup
- ✗ User managed
- ✗ Additional complexity

**Best For:**
- Multi-tenant applications
- Cost allocation across teams
- Separate environments (dev/staging/prod)
- Chargeback scenarios

### Key Takeaways:

1. **Use System-Defined** for high availability and simplicity
2. **Use Application** for cost tracking and multi-tenancy
3. **Both inherit** cross-region routing capabilities
4. **Application profiles** enable better governance
5. **Choose based on** organizational needs, not just technical requirements