# Guardrails

In this notebook, we will explore the use of guardrails in Amazon Bedrock, a feature designed to control and restrict agent behaviour during interactions.

Guardrails allow you to:

- Prevent unwanted inputs (e.g., sensitive information or unsafe commands).

- Restrict inappropriate outputs from the agent.

- Keep conversations within safe and defined boundaries, ensuring that the agent follows specific organisational policies or rules.

Introduction to the types of guardrails available.

- Creating and configuring input and output rules.

- Associating guardrails with an existing agent.

- Behaviour testing with restricted input examples.

https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html

In [56]:
import boto3
import json
import uuid

from utils import invoke_agent_helper

In [18]:
session = boto3.session.Session()
region = session.region_name

bedrock_client = boto3.client('bedrock')
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name=region)

In [15]:
# complete with yours
agent_id = 'XXXXXXXXX'
alias_id = 'TSTALIASID' # it works with any agent

kb_id = 'XXXXXXXXX'

model_sonnet_id = "eu.anthropic.claude-3-7-sonnet-20250219-v1:0"
model_arn = 'arn:aws:bedrock:xx-xxxxx-x:XXXXXXXXX:inference-profile/eu.anthropic.claude-3-7-sonnet-20250219-v1:0'

# Create Guardrails
We can specify filters, topics, words, etc.

In [4]:
# define topics

topics_config = {
    'topicsConfig': [
        {
            'name': 'Cooking and Recipes',
            'definition': 'Prevent the agent from giving cooking instructions, recipes, or food preparation advice.',
            'examples': [
                'How do I make a chocolate cake?',
                'Give me a recipe for spaghetti carbonara.',
                'Step-by-step instructions to bake cookies.'
            ],
            'type': 'DENY'
        },
        {
            'name': 'Funny Warnings',
            'definition': 'Warn users about potentially off-topic or silly requests that are outside the agent’s scope.',
            'examples': [
                'Tell me how to turn my house into a giant pancake.',
                'Give me a recipe for a magic potion.'
            ],
            'type': 'DENY'
        }
    ]
}

In [5]:
# define content config and word

content_config = {
        'filtersConfig': [
            {
                'type': 'SEXUAL',
                'inputStrength': 'HIGH',
                'outputStrength': 'HIGH'
            },
            {
                'type': 'VIOLENCE',
                'inputStrength': 'HIGH',
                'outputStrength': 'HIGH'
            },
            {
                'type': 'HATE',
                'inputStrength': 'HIGH',
                'outputStrength': 'HIGH'
            },
            {
                'type': 'INSULTS',
                'inputStrength': 'HIGH',
                'outputStrength': 'HIGH'
            },
            {
                'type': 'MISCONDUCT',
                'inputStrength': 'HIGH',
                'outputStrength': 'HIGH'
            },
            {
                'type': 'PROMPT_ATTACK',
                'inputStrength': 'HIGH',
                'outputStrength': 'NONE'
            }
        ]
    }

word_config = {
    'wordsConfig': [
            {'text': 'fiduciary advice'},
            {'text': 'investment recommendations'},
            {'text': 'stock picks'},
            {'text': 'financial planning guidance'},
            {'text': 'portfolio allocation advice'},
            {'text': 'retirement fund suggestions'},
            {'text': 'trust fund setup'},
            {'text': 'investment strategy'},
            {'text': 'financial advisor recommendations'}
        ],
        'managedWordListsConfig': [
            {'type': 'PROFANITY'}
        ]
    }

In [6]:
sensitive_config = {
        'piiEntitiesConfig': [
            {'type': 'EMAIL', 'action': 'ANONYMIZE'},
            {'type': 'PHONE', 'action': 'ANONYMIZE'},
            {'type': 'NAME', 'action': 'ANONYMIZE'},
            {'type': 'SOCIAL_SECURITY_NUMBER', 'action': 'BLOCK'},
            {'type': 'BANK_ACCOUNT_NUMBER', 'action': 'BLOCK'},
            {'type': 'CREDIT_DEBIT_CARD_NUMBER', 'action': 'BLOCK'}
        ],
        'regexesConfig': [
            {
                'name': 'Account Number',
                'description': 'Matches account numbers in the format XXXXXX1234',
                'pattern': r'\b\d{6}\d{4}\b',
                'action': 'ANONYMIZE'
            }
        ]
    }

In [23]:
guardrailName = "chat-advice"

In [9]:
# create guardrail api

create_response = bedrock_client.create_guardrail(
    name=guardrailName,
    description='Prevents the our model from providing recipes advice.',
    topicPolicyConfig=topics_config,
    contentPolicyConfig=content_config,
    wordPolicyConfig=word_config,
    sensitiveInformationPolicyConfig=sensitive_config,
    contextualGroundingPolicyConfig={
        'filtersConfig': [
            {
                'type': 'GROUNDING',
                'threshold': 0.75
            },
            {
                'type': 'RELEVANCE',
                'threshold': 0.75
            }
        ]
    },
    blockedInputMessaging="""Sorry, the model cannot answer this question. Please review the trace for more details.""",
    blockedOutputsMessaging="""Sorry, the model cannot answer this question. Please review the trace for more details.""",
    tags=[
        {'key': 'purpose', 'value': 'recipes-advice-prevention'},
        {'key': 'environment', 'value': 'production'}
    ]
)


In [None]:
create_response

In [None]:
# create a version
version_response = bedrock_client.create_guardrail_version(
    guardrailIdentifier=create_response['guardrailId'],
    description='First Version'
)
version_response

As with the agent and the kb, we can access the guardrail through its ID.

In [14]:
guardrailId = version_response['guardrailId']
guardrailVersion = version_response['version']

# Invoke Model

We have now defined our Guardrail. Before adding it to the Agent, we can test it by adding it to *invoke_model*.

In [20]:
# Create request body for Claude 3.7 Sonnet

prompt = "How should I cook tonight?"

claude_body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1000,
    "temperature": 0.5,
    "top_p": 0.9,
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": prompt}]
        }
    ],
})

In [21]:
response = bedrock_runtime.invoke_model(
        modelId=model_sonnet_id,
        body=claude_body,
        accept="application/json",
        contentType="application/json",
        guardrailIdentifier = guardrailId, 
        guardrailVersion = guardrailVersion, 
        trace = "ENABLED"
    )

response_body = json.loads(response.get('body').read())

In [None]:
response_body

Currently, Guardarail only works with text, but we could also adapt it to filter images...

## Guardrail with images

We have updated the guardrail to work with images... (not enabled in this region)

In [None]:
update_guardrail_response = bedrock_client.update_guardrail(
        guardrailIdentifier=guardrailId,
        name=guardrailName,
        description='Detect and block harmful images.',
        contentPolicyConfig={
            'filtersConfig': [
                {
                    'type': 'SEXUAL',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'HIGH',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                },
                {
                    'type': 'VIOLENCE',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'HIGH',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                },
                {
                    'type': 'HATE',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'HIGH',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                },
                {
                    'type': 'INSULTS',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'HIGH',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                },
                {
                    'type': 'MISCONDUCT',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'HIGH',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                },
                {
                    'type': 'PROMPT_ATTACK',
                    'inputStrength': 'HIGH',
                    'outputStrength': 'NONE',
                    'inputModalities': ['TEXT', 'IMAGE'],
                    'outputModalities': ['TEXT', 'IMAGE']
                }
            ]
        },
        blockedInputMessaging='Sorry, the model cannot answer this question. Please review the trace for more details.',
        blockedOutputsMessaging='Sorry, the model cannot answer this question. Please review the trace for more details.',
    )

In [None]:
# create guardrail version
version_response = bedrock_client.create_guardrail_version(
    guardrailIdentifier=update_guardrail_response['guardrailId'],
    description='Version of Guardrail that has a image filter'
)

# get id
guardrailId = version_response['guardrailId']
guardrailVersion = version_response['version']

### Test images

In [None]:
# configure message and guardrail config

input_text = "Describe the image"
image_path = "test-image1.jpg"

with open(image_path, 'rb') as image:
    image_bytes = image.read()

message = {
            "role": "user",
            "content": [
                {"text": input_text},                
                {
                    "image": {
                        "format": "jpeg",
                        "source": {"bytes": image_bytes}
                    }
                }            
            ]
        }

guardrail_config = {
            "guardrailIdentifier": guardrailId,
            "guardrailVersion": guardrailVersion
        }


In [None]:
# Make API call
response = bedrock_runtime.converse(
            modelId=model_sonnet_id,
            messages=[message],
            guardrailConfig=guardrail_config
)

# Asociate Guardrail to Agent

We associate our Agent with the Guardrail configuration.

In [45]:
# create guardrail config
guardrail_config = {
            "guardrailIdentifier": guardrailId,
            "guardrailVersion": guardrailVersion
        }

In [37]:
# get our agent information needed to update
agent_response = bedrock_agent_client.get_agent(
            agentId=agent_id)

In [46]:
agent_details = agent_response.get('agent')
agent_details['guardrailConfiguration'] = guardrail_config

In [42]:
# Get only necessary params to use update_agent

_promptOverrideConfigsList = agent_details['promptOverrideConfiguration'].get('promptConfigurations')
_filteredPromptOverrideConfigsList = list(filter(lambda x: (x['promptCreationMode'] == "OVERRIDDEN"), 
                                                 _promptOverrideConfigsList))
agent_details['promptOverrideConfiguration']['promptConfigurations'] = _filteredPromptOverrideConfigsList

for key_to_remove in ['clientToken', 'createdAt', 'updatedAt', 'preparedAt', 'agentStatus', 'agentArn']:
    if key_to_remove in agent_details:
        del(agent_details[key_to_remove])

In [None]:
agent_details

In [48]:
# update with new guardrail config
response = bedrock_agent_client.update_agent(**agent_details)
print(response)
response = bedrock_agent_client.prepare_agent(agentId=agent_id)

Previously, we used the default version. We can create an alias for traceability: 

In [None]:
# create alias

response = bedrock_agent_client.create_agent_alias(
    agentAliasName='AgentWithGuardrail',
    agentId=agent_id,
    description='Test alias with Guardrails for Amazon Bedrock',
)

alias_id = response["agentAlias"]["agentAliasId"]
alias_id

In [None]:
# prepare agent

response = bedrock_agent_client.prepare_agent(agentId=agent_id)
response

# Invoke Agent

We have our agent updated with the guardrail. We tried invoking it with a denied topic.

In [58]:
session_id= str(uuid.uuid1())
query = "Cual es mi número de tarjeta de crédito?"
session_state = {
    "promptSessionAttributes": {
        "name": "Emma"
    }
}

In [None]:
response = invoke_agent_helper(query, session_id, agent_id, alias_id, session_state=session_state)

We tried asking a question where the guardrail should not pop up.

In [None]:
# test without guardrail

query = "Que informacion tienes?"
response = invoke_agent_helper(query, session_id, agent_id, alias_id, session_state=session_state)
response

In [None]:
## Delete space

#delete Guardrail
#bedrock_client.delete_guardrail(guardrailIdentifier=guardrail_id)
