# Nova multimodal understanding prompt enginering sandbox

Designing an effective system prompt and tool-use configuration often takes time to refine. For complex tool-use JSON structures, it’s a best practice to test them before integrating with the video understanding tool.

This notebook lets you do exactly that—test your prompts and configurations with light inputs (such as a single image or a short video) so you can quickly validate results without processing an entire video, making iteration faster and more efficient.

In [None]:
import boto3
import json
import time
from utils import bedrock_tool
from IPython.display import JSON, Markdown, display

## Test Nova image understanding configuration

The following sampels shows define a system prompt and toolUse configruation which will return object with coordinates.

In [None]:
prompt = "Detect objects in the image and return their coordinates"

In [None]:
tool_config ={
    "toolChoice": {
        "tool": {
            "name": "tool_result_with_bb"
        }
    },
    "tools": [
        {
            "toolSpec": {
                "name": "tool_result_with_bb",
                "description": "Detect objects in the image and return their coordinates",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "objects": {
                                "type": "array",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "name": {
                                            "type": "string",
                                            "enum": [
                                                "person",
                                                "car",
                                                "package"
                                            ]
                                        },
                                        "coordinates": {
                                            "type": "object",
                                            "properties": {
                                                "x": {
                                                    "type": "number"
                                                },
                                                "width": {
                                                    "type": "number"
                                                },
                                                "y": {
                                                    "type": "number"
                                                },
                                                "height": {
                                                    "type": "number"
                                                }
                                            },
                                            "required": [
                                                "x",
                                                "y",
                                                "width",
                                                "height"
                                            ]
                                        }
                                    },
                                    "required": [
                                        "name",
                                        "coordinates"
                                    ]
                                }
                            }
                        },
                        "required": [
                            "objects"
                        ]
                    }
                }
            }
        }
    ]
}

Test the prompt using a local image.

In [None]:
response = bedrock_tool.bedrock_converse(
    model_id='amazon.nova-lite-v1:0',
    prompt=prompt, 
    tool_config=tool_config, 
    inference_config={
                "maxTokens": 500,
                "topP": 0.1,
                "temperature": 0.7
              },
    local_file_path="./samples/door-bell-camera.jpeg"
)
custom_output = bedrock_tool.parse_converse_response(response)
display(JSON(custom_output))

## Analyze video
Here’s a sample for video understanding that you can use as a starting point.

In [None]:
prompt = "You are a content moderation expert tasked with reviewing videos to detect unsafe content. Do not including thinking to the result."

In [None]:
tool_config = {
      "tools": [
        {
          "toolSpec": {
            "name": "tool_result",
            "description": "Detect whether the provided video contains any unsafe content and classify it into the following categories.",
            "inputSchema": {
              "json": {
                "type": "object",
                "properties": {
                  "suggestive": {
                    "type": "boolean",
                    "description": "Indicates if the video contains bare-chested males, revealing clothing, or other inappropriate scenes."
                  },
                  "tobacco": {
                    "type": "boolean",
                    "description": "Indicates if the video contains any scenes involving tobacco or smoking."
                  },
                  "alcohol": {
                    "type": "boolean",
                    "description": "Indicates if the video contains any scenes involving alcohol."
                  }
                },
                "required": [
                  "suggestive",
                  "tobacco",
                  "alcohol"
                ]
              }
            }
          }
        }
      ]
    }


In [None]:
response = bedrock_tool.bedrock_converse(
    model_id='amazon.nova-lite-v1:0',
    prompt=prompt, 
    tool_config=tool_config, 
    inference_config={
                "maxTokens": 500,
                "topP": 0.1,
                "temperature": 0.7
              },
    local_file_path="./samples/content-moderation-demo.mp4"
)
custom_output = bedrock_tool.parse_converse_response(response)
display(JSON(custom_output))