# Deploying Granite Guardian 3.2 5b models in Amazon SageMaker

## Introduction to Granite Guardian 3.2 5b Models

Granite Guardian 3.2 5B is a thinned down version of Granite Guardian 3.1 8B designed to detect risks in prompts and responses. It can help with risk detection along many key dimensions catalogued in the [IBM AI Risk Atlas](https://www.ibm.com/docs/en/watsonx/saas?topic=ai-risk-atlas).

**IBM Granite 3.2 family include**: 
- Granite 3.2 Language, Granite 3.1 Language and Granite 3.0 Language Models
- Granite Vision Models
- Granite Embedding Models
- Granite Code Models
- Granite Guardian Models
- Granite Time Series Models
- Granite Experiments, Geospatial Models

The Granite Guardian 3.2 5b model is part of IBM’s Granite series and is designed for content moderation, safety filtering, and responsible AI use. It is a 5-billion-parameter model optimized to detect harmful, unsafe, or inappropriate content in text and multi-modal inputs.

IBM has released the Granite Granite Guardian 3.2 5b models to open source under the permissive Apache 2.0 license, enabling their use for both research and commercial purposes with no restrictions. The models are available on Amazon SageMaker JumpStart, the AWS Marketplace, and on [Hugging Face](https://huggingface.co/ibm-granite).

In this notebook, we will deploy the Granite Guardian 3.2 5b models on Amazon SageMaker.

## Pre-requisites

- Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
- *Note*: Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
- Ensure that the IAM role used has **AmazonSageMakerFullAccess**.

## Contents

1. **Deploying Granite Guardian 3.2 5b models in Amazon SageMaker**
    - To subscribe to the model package
    - Select the model package
2. **Create an endpoint and perform real-time inference**
    - Define the endpoint configuration
    - Create the endpoint
3. **Run inference with the model**
    - Example : This is an example model request! I think LLMs are cool.
4. **Clean-up**
    - Delete the endpoint
    - Delete the model    

## Usage Instructions

You can run this notebook one cell at a time by using **Shift+Enter** to run a cell.

## Deploying Granite Guardian 3.2 5b models in Amazon SageMaker

### To subscribe to the model package:

1. Open the model package listing page [IBM Granite Guardian 3.2 5b ](https://aws.amazon.com/marketplace/pp/prodview-wb6hb4222lc3y)
2. On the AWS Marketplace listing, click on the Continue to subscribe button.
3. On the Subscribe to this software page, review and click on "Accept Offer" if you and your organization agrees with EULA, pricing, and support terms.
4. Once you click on Continue to configuration button and then choose a region, you will see a Product Arn displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

### 1. Select the model package

Confirm that you received this notebook from model catalog on SageMaker AWS Management Console.

In [None]:
model_package_map = {
    "us-east-1": "",
    "us-east-2": "",
    "us-west-1": "",
    "us-west-2": "",
    "ca-central-1": "",
    "eu-central-1": "",
    "eu-west-1": "",
    "eu-west-2": "",
    "eu-west-3": "",
    "eu-north-1": "",
    "ap-southeast-1": "",
    "ap-southeast-2": "",
    "ap-northeast-2": "",
    "ap-northeast-1": "",
    "ap-south-1": "",
    "sa-east-1": ""
}

In [None]:
!pip install --upgrade pip
!pip install -U sagemaker -q

In [None]:
import json
import pprint
from datetime import datetime

import boto3
import sagemaker
from sagemaker import ModelPackage, get_execution_role

In [None]:
sagemaker_session = sagemaker.Session()

try:
    execution_role_arn = sagemaker.get_execution_role()
except ValueError:
    execution_role_arn = None

if execution_role_arn == None:
    execution_role_arn = input("Enter your execution role ARN: ")

region = sagemaker_session.boto_region_name
runtime_sm_client = boto3.client("runtime.sagemaker")

print ("execution_role_arn: ", execution_role_arn)
print ("region: ", region)

In [None]:
if region not in model_package_map.keys():
    raise "UNSUPPORTED REGION"

model_package_arn = model_package_map[region]

print ("model_package_arn: ", model_package_arn)

## Create an endpoint and perform real-time inference

In this example, we're deploying IBM  Granite Guardian 3.2 5b on an Amazon SageMaker real-time endpoint hosted on a GPU instance. If you need general information on real-time inference with Amazon SageMaker, please refer to the SageMaker documentation.

For flexibility, you can pick from two sample configurations, depending your use case and the instances types available to you. Please make sure to run just one of the configuration cells below.

The endpoint configuration focuses on cost efficiency. It uses a ml.g6e.2xlarge instance. This instance has a LS40 GPU. The IBM Granite Guardian 3.2 5b model is a fine-tuned, instruction-following language model that supports long-context inputs and is optimized for scenarios requiring cost-efficient and high-performance inference.


### 2. Define the endpoint configuration

In [None]:
model_name = "granite-guardian-3-2-5b"
inference_instance_type = "ml.g6e.2xlarge"
model_download_timeout = 3600
health_check_timeout = 900
instance_count = 1

### 3. Create the endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
    role=execution_role_arn, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

# create a unique endpoint name
timestamp = "{:%Y-%m-%d-%H-%M-%S}".format(datetime.now())
endpoint_name = f"{model_name}-{timestamp}"
print(f"Deploying endpoint {endpoint_name}")

In [None]:
# deploy the model
deployed_model = model.deploy(
    initial_instance_count=instance_count,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name
)


If you have already deployed your model, you can also access it via your chosen endpoint_name and sagemaker_session:

In [None]:
deployed_model = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker_session,
)

SageMaker will now create our endpoint and deploy the model to it. This can takes a 10-15 minutes. Once the endpoint is in service, you will be able to perform real-time inference.

## Run inference with the model

Now that we have the Granite Guardian 3.2 5b model loaded and deployed to a SageMaker endpoint, we can start testing. We use the predict method from the predictor to run inference on our endpoint. We can inference with different parameters to impact the generation. Parameters can be defined as in the parameters attribute of the payload.

### 4. This is an example model request! I think LLMs are cool. 

In this example, we want to test sample prompt.

In [None]:
prompt_1 = """You are a safety agent trying to help determine whether the user message encourages self-harm according to the safety risk definition below.\n\n<start_of_turn>\nUser Message: Life is meaningless. Maybe you should just give up.\n<end_of_turn>\n\nOur safety risk definition is defined below:\n\n<start_of_risk_definition>\n* The 'User' message encourages or promotes self-harm, suicide, or destructive behaviors.\n<end_of_risk_definition>\n\nIs the user message harmful based on the risk definition? Your answer must be either 'Yes' or 'No'.\n\nAnswer: """

In [None]:
# hyperparameters for llm
payload_1 = {
    "inputs": prompt_1,
    "parameters": {
        "do_sample": False,
        "max_new_tokens": 1,
        "return_dict_in_generate": True,
        "output_scores": True,
        "return_full_text": False
    }
}

# send request to endpoint
response_1 = deployed_model.predict(
    data=json.dumps(payload_1),
    initial_args={"Accept": "application/json", "ContentType": "application/json"},
).decode("utf-8")

response_list = json.loads(response_1)[0]
print(response_list["generated_text"])


## Clean-up

Please don't forget to run the cells below to delete all resources and avoid unecessary charges.

### 7. Delete the endpoint

In [None]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### 8. Delete the model

In [None]:
model.delete_model()

Thank you for trying out IBM Granite Guardian 3.2 5b on SageMaker. We have only scratched the surface of what you can do with this model.

Welcome to your IBM Granite Model support experience! You can view, start, or contribute to community discussions (sign in to contribute). View supplemental resources and  [sign](https://www.ibm.com/mysupport/s/?language=en_US) in to open a new case.

## Would you like to provide feedback?

Please let us know your comments about our family of Granite Guardian 3.2 5b by visiting our collection. Select the repository of the model you would like to provide feedback about. Then, go to Community tab, and click on New discussion. Alternatively, you can also post any questions/comments on our github discussions page.