# Extract Content from Your File

This notebook demonstrate you can use Content Understanding API to extract semantic content from multimodal files.

## Prerequisites
1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)
2. Install the required packages to run the sample.

In [3]:
%pip install -r ../requirements.txt

Collecting aiohttp (from -r ../requirements.txt (line 1))
  Downloading aiohttp-3.12.15-cp313-cp313-macosx_10_13_x86_64.whl.metadata (7.7 kB)
Collecting azure-identity (from -r ../requirements.txt (line 2))
  Downloading azure_identity-1.24.0-py3-none-any.whl.metadata (86 kB)
Collecting azure-storage-blob (from -r ../requirements.txt (line 3))
  Downloading azure_storage_blob-12.26.0-py3-none-any.whl.metadata (26 kB)
Collecting python-dotenv (from -r ../requirements.txt (line 4))
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting requests (from -r ../requirements.txt (line 5))
  Downloading requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting Pillow (from -r ../requirements.txt (line 6))
  Downloading pillow-11.3.0-cp313-cp313-macosx_10_13_x86_64.whl.metadata (9.0 kB)
Collecting aiohappyeyeballs>=2.5.0 (from aiohttp->-r ../requirements.txt (line 1))
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.4.0 (

## Create Azure AI Content Understanding Client

> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.

> ⚠️ Important:
You must update the code below to match your Azure authentication method.
Look for the `# IMPORTANT` comments and modify those sections accordingly.
If you skip this step, the sample may not run correctly.

> ⚠️ Note: Using a subscription key works, but using a token provider with Azure Active Directory (AAD) is much safer and is highly recommended for production environments.

In [18]:
import logging
import json
import os
import sys
import uuid
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

# For authentication, you can use either token-based auth or subscription key, and only one of them is required
AZURE_AI_ENDPOINT = os.getenv("AZURE_AI_ENDPOINT")
# IMPORTANT: Replace with your actual subscription key or set up in ".env" file if not using token auth
AZURE_AI_API_KEY = os.getenv("AZURE_AI_API_KEY")
AZURE_AI_API_VERSION = os.getenv("AZURE_AI_API_VERSION", "2025-05-01-preview")

# Add the parent directory to the path to use shared modules
parent_dir = Path(Path.cwd()).parent
sys.path.append(str(parent_dir))
from python.content_understanding_client import AzureContentUnderstandingClient

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

client = AzureContentUnderstandingClient(
    endpoint=AZURE_AI_ENDPOINT,
    api_version=AZURE_AI_API_VERSION,
    # IMPORTANT: Comment out token_provider if using subscription key
    token_provider=token_provider,
    # IMPORTANT: Uncomment this if using subscription key
    # subscription_key=AZURE_AI_API_KEY,
    x_ms_useragent="azure-ai-content-understanding-python/content_extraction", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)

# Utility function to save images
from PIL import Image
from io import BytesIO
import re

def save_image(image_id: str, response):
    raw_image = client.get_image_from_analyze_operation(analyze_response=response,
        image_id=image_id
    )
    image = Image.open(BytesIO(raw_image))
    # image.show()
    Path(".cache").mkdir(exist_ok=True)
    image.save(f".cache/{image_id}.jpg", "JPEG")


INFO:azure.identity._credentials.environment:No environment configuration found.
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.24.0 Python/3.13.6 (macOS-15.6-x86_64-i386-64bit-Mach-O)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential


## Document Content

Content Understanding API is designed to extract all textual content from a specified document file. In addition to text extraction, it conducts a comprehensive layout analysis to identify and categorize tables and figures within the document. The output is then presented in a structured markdown format, ensuring clarity and ease of interpretation.



In [21]:
#ANALYZER_SAMPLE_FILE = '../data/invoice.pdf'
ANALYZER_SAMPLE_FILE = '../data/TrendzInputFile.pdf'
ANALYZER_ID = 'prebuilt-documentAnalyzer'

# Analyzer file
response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)
result_json = client.poll_result(response)

print(json.dumps(result_json, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/TrendzInputFile.pdf with analyzer: prebuilt-documentAnalyzer
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1c27-400d-b709-3dd915b28171 in progress ...
INFO:python.content_understanding_client:Request ca14a61a-1

{
  "id": "ca14a61a-1c27-400d-b709-3dd915b28171",
  "status": "Succeeded",
  "result": {
    "analyzerId": "prebuilt-documentAnalyzer",
    "apiVersion": "2025-05-01-preview",
    "createdAt": "2025-08-14T21:21:04Z",
    "contents": [
      {
        "markdown": "DATE: 09-11-24 (LCA.RPT)\n\nLARGE CLAIM ALERTS\n\nFOR GROUP 18782 -\n\nEFFECTIVE DATE OF PLANYEAR IS 01-01-24\nGROUPS SPECIFIC DEDUCTIBLE PER INDIVIDUAL IS 750000.00\nFOR 01-01-24 TO 09-11-24\n\n\n<table>\n<tr>\n<th rowspan=\"2\">ALT ID</th>\n<th rowspan=\"2\">EM L YEE</th>\n<th rowspan=\"2\">NAME</th>\n<th colspan=\"2\" rowspan=\"2\">CLAIMANT</th>\n<th rowspan=\"2\">CIMT D B TE MDATE</th>\n<th rowspan=\"2\">BILLED</th>\n<th rowspan=\"2\">AID</th>\n<th rowspan=\"2\">CT S EC LS D</th>\n<th rowspan=\"2\">S EC S BMISSI N</th>\n<th rowspan=\"2\">DATE SE VICE</th>\n<th>OF</th>\n<th rowspan=\"2\">DESC</th>\n<th rowspan=\"2\">EMA K DESC</th>\n</tr>\n<tr>\n<th>ICD DIA N</th>\n</tr>\n<tr>\n<td></td>\n<td></td>\n<td></td>\n<td colspan=\

> The markdown output contains layout information, which is very useful for Retrieval-Augmented Generation (RAG) scenarios. You can paste the markdown into a viewer such as Visual Studio Code and preview the layout structure.

In [37]:
print(result_json["result"]["contents"][0])

{'markdown': 'DATE: 09-11-24 (LCA.RPT)\n\nLARGE CLAIM ALERTS\n\nFOR GROUP 18782 -\n\nEFFECTIVE DATE OF PLANYEAR IS 01-01-24\nGROUPS SPECIFIC DEDUCTIBLE PER INDIVIDUAL IS 750000.00\nFOR 01-01-24 TO 09-11-24\n\n\n<table>\n<tr>\n<th rowspan="2">ALT ID</th>\n<th rowspan="2">EM L YEE</th>\n<th rowspan="2">NAME</th>\n<th colspan="2" rowspan="2">CLAIMANT</th>\n<th rowspan="2">CIMT D B TE MDATE</th>\n<th rowspan="2">BILLED</th>\n<th rowspan="2">AID</th>\n<th rowspan="2">CT S EC LS D</th>\n<th rowspan="2">S EC S BMISSI N</th>\n<th rowspan="2">DATE SE VICE</th>\n<th>OF</th>\n<th rowspan="2">DESC</th>\n<th rowspan="2">EMA K DESC</th>\n</tr>\n<tr>\n<th>ICD DIA N</th>\n</tr>\n<tr>\n<td></td>\n<td></td>\n<td></td>\n<td colspan="2">S</td>\n<td>04-20-24</td>\n<td>1067.03</td>\n<td>67.13</td>\n<td>0</td>\n<td>HA37X03</td>\n<td>12-28-23</td>\n<td>10 Z94.0</td>\n<td>KIDNEY</td>\n<td>Provider</td>\n</tr>\n<tr>\n<td></td>\n<td></td>\n<td></td>\n<td colspan="2"></td>\n<td></td>\n<td></td>\n<td></td>\n<td></

> You can get the layout information, including ```words/lines``` in the pagesnode and paragraphs info in ```paragraphs```, and ```tables``` in the table.

In [33]:
print(json.dumps(result_json["result"]["contents"][0]["pages"][0], indent=2))


{
  "pageNumber": 1,
  "angle": -0.0791,
  "width": 11,
  "height": 8.5,
  "spans": [
    {
      "offset": 0,
      "length": 6554
    }
  ],
  "words": [
    {
      "content": "DATE:",
      "span": {
        "offset": 0,
        "length": 5
      },
      "confidence": 0.992,
      "source": "D(1,0.2312,1.166,0.5221,1.1654,0.522,1.2654,0.2312,1.2639)"
    },
    {
      "content": "09-11-24",
      "span": {
        "offset": 6,
        "length": 8
      },
      "confidence": 0.995,
      "source": "D(1,0.6042,1.1651,1.0743,1.1669,1.0741,1.2673,0.6041,1.2659)"
    },
    {
      "content": "(LCA.RPT)",
      "span": {
        "offset": 15,
        "length": 9
      },
      "confidence": 0.952,
      "source": "D(1,1.1647,1.1678,1.6608,1.1679,1.6608,1.2673,1.1645,1.2673)"
    },
    {
      "content": "LARGE",
      "span": {
        "offset": 26,
        "length": 5
      },
      "confidence": 0.994,
      "source": "D(1,6.4676,1.2936,6.7793,1.2935,6.7795,1.3816,6.4682,1.3829)"


> This statement allows you to get structural information of the tables in the documents.

In [35]:
print(json.dumps(result_json["result"]["contents"][0]["tables"], indent=2))

[
  {
    "rowCount": 36,
    "columnCount": 14,
    "cells": [
      {
        "kind": "columnHeader",
        "rowIndex": 0,
        "columnIndex": 0,
        "rowSpan": 2,
        "columnSpan": 1,
        "content": "ALT ID",
        "source": "D(1,0.2076,2.1229,0.8144,2.1229,0.8074,2.465,0.2007,2.465)",
        "span": {
          "offset": 215,
          "length": 6
        },
        "elements": [
          "/paragraphs/4"
        ]
      },
      {
        "kind": "columnHeader",
        "rowIndex": 0,
        "columnIndex": 1,
        "rowSpan": 2,
        "columnSpan": 1,
        "content": "EM L YEE",
        "source": "D(1,0.8144,2.1229,1.456,2.1229,1.456,2.465,0.8074,2.465)",
        "span": {
          "offset": 243,
          "length": 8
        },
        "elements": [
          "/paragraphs/5"
        ]
      },
      {
        "kind": "columnHeader",
        "rowIndex": 0,
        "columnIndex": 2,
        "rowSpan": 2,
        "columnSpan": 1,
        "content": "NAME

## Audio Content
Our API output facilitates detailed analysis of spoken language, allowing developers to utilize the data for various applications, such as voice recognition, customer service analytics, and conversational AI. The structure of the output makes it easy to extract and analyze different components of the conversation for further processing or insights.

1. Speaker Identification: Each phrase is attributed to a specific speaker (in this case, "Speaker 2"). This allows for clarity in conversations with multiple participants.
1. Timing Information: Each transcription includes precise timing data:
    - startTimeMs: The time (in milliseconds) when the phrase begins.
    - endTimeMs: The time (in milliseconds) when the phrase ends.
    This information is crucial for applications like video subtitles, allowing synchronization between the audio and the text.
1. Text Content: The actual spoken text is provided, which in this instance is "Thank you for calling Woodgrove Travel." This is the main content of the transcription.
1. Confidence Score: Each transcription phrase includes a confidence score (0.933 in this case), indicating the likelihood that the transcription is accurate. A higher score suggests greater reliability.
1. Word-Level Breakdown: The transcription is further broken down into individual words, each with its own timing information. This allows for detailed analysis of speech patterns and can be useful for applications such as language processing or speech recognition improvement.
1. Locale Specification: The locale is specified as "en-US," indicating that the transcription is in American English. This is important for ensuring that the transcription algorithms account for regional dialects and pronunciations.


In [9]:
ANALYZER_SAMPLE_FILE = '../data/audio.wav'
ANALYZER_ID = 'prebuilt-audioAnalyzer'

# Analyzer file
response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)
result_json = client.poll_result(response)

print(json.dumps(result_json, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/audio.wav with analyzer: prebuilt-audioAnalyzer
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b-1f7362766db3 in progress ...
INFO:python.content_understanding_client:Request 79e9edf9-e50b-4fe3-a17b

{
  "id": "79e9edf9-e50b-4fe3-a17b-1f7362766db3",
  "status": "Succeeded",
  "result": {
    "analyzerId": "prebuilt-audioAnalyzer",
    "apiVersion": "2025-05-01-preview",
    "createdAt": "2025-08-14T18:34:59Z",
    "stringEncoding": "utf8",
    "contents": [
      {
        "markdown": "# Audio: 00:00.000 => 01:54.670\n\nTranscript\n```\nWEBVTT\n\n00:00.080 --> 00:02.160\n<v Speaker 1>Thank you for calling Woodgrove Travel.\n\n00:02.960 --> 00:04.560\n<v Speaker 1>My name is Isabella Taylor.\n\n00:05.280 --> 00:06.880\n<v Speaker 1>How may I assist you today?\n\n00:07.680 --> 00:10.240\n<v Speaker 2>Hi Isabella, my name is John Smith.\n\n00:11.040 --> 00:17.920\n<v Speaker 2>I recently traveled from New York City to Los Angeles on a business trip, and I had a terrible experience with my flight.\n\n00:18.720 --> 00:20.880\n<v Speaker 1>I'm sorry to hear that, John.\n\n00:21.680 --> 00:27.200\n<v Speaker 1>Could you please provide me with the details of your flight, such as the airlin

## Video Content
Video output provides detailed information about audiovisual content, specifically video shots. Here are the key features it offers:

1. Shot Information: Each shot is defined by a start and end time, along with a unique identifier. For example, Shot 0:0.0 to 0:2.800 includes a transcript and key frames.
1. Transcript: The API includes a transcript of the audio, formatted in WEBVTT, which allows for easy synchronization with the video. It captures spoken content and specifies the timing of the dialogue.
1. Key Frames: It provides a series of key frames (images) that represent important moments in the video shot, allowing users to visualize the content at specific timestamps.
1. Description: Each shot is accompanied by a description, providing context about the visuals presented. This helps in understanding the scene or subject matter without watching the video.
1. Audio Visual Metadata: Details about the video such as dimensions (width and height), type (audiovisual), and the presence of key frame timestamps are included.
1. Transcript Phrases: The output includes specific phrases from the transcript, along with timing and speaker information, enhancing the usability for applications like closed captioning or search functionalities.

In [10]:
ANALYZER_SAMPLE_FILE = '../data/FlightSimulator.mp4'
ANALYZER_ID = 'prebuilt-videoAnalyzer'

# Analyzer file
response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)
result_json = client.poll_result(response)

print(json.dumps(result_json, indent=2))

# Save keyframes (optional)
keyframe_ids = set()
result_data = result_json.get("result", {})
contents = result_data.get("contents", [])

# Iterate over contents to find keyframes if available
for content in contents:
    # Extract keyframe IDs from "markdown" if it exists and is a string
    markdown_content = content.get("markdown", "")
    if isinstance(markdown_content, str):
        keyframe_ids.update(re.findall(r"(keyFrame\.\d+)\.jpg", markdown_content))

# Output the results
print("Unique Keyframe IDs:", keyframe_ids)

# Save all keyframe images
for keyframe_id in keyframe_ids:
    save_image(keyframe_id, response)

INFO:python.content_understanding_client:Analyzing file ../data/FlightSimulator.mp4 with analyzer: prebuilt-videoAnalyzer
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf in progress ...
INFO:python.content_understanding_client:Request 71a6d26f-a07c

{
  "id": "71a6d26f-a07c-41e2-8a0c-fdf75d8ea8cf",
  "status": "Succeeded",
  "result": {
    "analyzerId": "prebuilt-videoAnalyzer",
    "apiVersion": "2025-05-01-preview",
    "createdAt": "2025-08-14T18:35:29Z",
    "stringEncoding": "utf8",
    "contents": [
      {
        "markdown": "# Video: 00:00.000 => 00:43.866\nWidth: 1080\nHeight: 608\n\n## Segment 1: 00:00.000 => 00:07.367\nThe video begins with an aerial view of a scenic island, featuring the logos of Flight Simulator and Microsoft Azure AI. The audio discusses the importance of good data for neural TTS (Text-to-Speech) to achieve a good voice. It mentions building a universal TTS model based on 3,000 hours of data to capture audio nuances and generate a more natural voice. The visuals transition to a waveform display, representing audio data.\n\nTranscript\n```\nWEBVTT\n\n00:01.360 --> 00:06.640\n<Speaker 1>When it comes to the neural TTS, in order to get a good voice, it's better to have good data.\n\n00:07.120 --> 00:1

## Video Content with Face
This is a gated feature, please go through process [Azure AI Resource Face Gating](https://learn.microsoft.com/en-us/legal/cognitive-services/computer-vision/limited-access-identity?context=%2Fazure%2Fai-services%2Fcomputer-vision%2Fcontext%2Fcontext#registration-process) Select `[Video Indexer] Facial Identification (1:N or 1:1 matching) to search for a face in a media or entertainment video archive to find a face within a video and generate metadata for media or entertainment use cases only` in the registration form.

In [11]:
ANALYZER_SAMPLE_FILE = '../data/FlightSimulator.mp4'
ANALYZER_ID = 'prebuilt-videoAnalyzer'

# Analyzer file
response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)
result_json = client.poll_result(response)

print(json.dumps(result_json, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/FlightSimulator.mp4 with analyzer: prebuilt-videoAnalyzer
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b-4975-b103-70cb107f5dcd in progress ...
INFO:python.content_understanding_client:Request 98cc3a26-c34b

{
  "id": "98cc3a26-c34b-4975-b103-70cb107f5dcd",
  "status": "Succeeded",
  "result": {
    "analyzerId": "prebuilt-videoAnalyzer",
    "apiVersion": "2025-05-01-preview",
    "createdAt": "2025-08-14T18:38:44Z",
    "stringEncoding": "utf8",
    "contents": [
      {
        "markdown": "# Video: 00:00.000 => 00:43.866\nWidth: 1080\nHeight: 608\n\n## Segment 1: 00:00.000 => 00:07.367\nThe video begins with an aerial view of a scenic island, featuring the logos of Flight Simulator and Microsoft Azure AI. This is followed by a brief introduction to neural TTS (Text-to-Speech) technology, emphasizing the importance of good data for achieving high-quality voice synthesis. The segment includes visuals of audio waveforms, representing the technical aspect of TTS.\n\nTranscript\n```\nWEBVTT\n\n00:01.360 --> 00:06.640\n<Speaker 1>When it comes to the neural TTS, in order to get a good voice, it's better to have good data.\n\n00:07.120 --> 00:13.320\n<Speaker 2>To achieve that, we build a uni

### Get and Save Key Frames and Face Thumbnails

In [12]:
# Initialize sets for unique face IDs and keyframe IDs
face_ids = set()
keyframe_ids = set()

# Extract unique face IDs safely
result_data = result_json.get("result", {})
contents = result_data.get("contents", [])

# Iterate over contents to find faces and keyframes if available
for content in contents:
    # Safely retrieve face IDs if "faces" exists and is a list
    faces = content.get("faces", [])
    if isinstance(faces, list):
        for face in faces:
            face_id = face.get("faceId")
            if face_id:
                face_ids.add(f"face.{face_id}")

    # Extract keyframe IDs from "markdown" if it exists and is a string
    markdown_content = content.get("markdown", "")
    if isinstance(markdown_content, str):
        keyframe_ids.update(re.findall(r"(keyFrame\.\d+)\.jpg", markdown_content))

# Output the results
print("Unique Face IDs:", face_ids)
print("Unique Keyframe IDs:", keyframe_ids)

# Save all face images
for face_id in face_ids:
    save_image(face_id, response)

# Save all keyframe images
for keyframe_id in keyframe_ids:
    save_image(keyframe_id, response)

Unique Face IDs: set()
Unique Keyframe IDs: {'keyFrame.40590', 'keyFrame.39897', 'keyFrame.43230', 'keyFrame.14190', 'keyFrame.7788', 'keyFrame.16929', 'keyFrame.38181', 'keyFrame.2046', 'keyFrame.36861', 'keyFrame.2640', 'keyFrame.30822', 'keyFrame.38676', 'keyFrame.10560', 'keyFrame.26532', 'keyFrame.9768', 'keyFrame.29106', 'keyFrame.18579', 'keyFrame.12078', 'keyFrame.8976', 'keyFrame.21714', 'keyFrame.42636', 'keyFrame.25674', 'keyFrame.4884', 'keyFrame.20196', 'keyFrame.32406', 'keyFrame.20955', 'keyFrame.5709', 'keyFrame.17754', 'keyFrame.36069', 'keyFrame.23232', 'keyFrame.34584', 'keyFrame.28248', 'keyFrame.27390', 'keyFrame.31614', 'keyFrame.12804', 'keyFrame.41283', 'keyFrame.726', 'keyFrame.24816', 'keyFrame.4059', 'keyFrame.33891', 'keyFrame.15444', 'keyFrame.6534', 'keyFrame.22473', 'keyFrame.14817'}
