# Azure Open AI o1 multi-modal Test
The following sample shows the most basic way to use the o1(GA) with Vision model with code.

> ✨ ***Note*** <br>
> Please check the supported models and API version before you get started - https://azure.microsoft.com/en-us/blog/announcing-the-o1-model-in-azure-openai-service-multimodal-reasoning-with-astounding-analysis/?msockid=388843f556c46b710353575a57e66a9a

## Prerequisites
Configure a Python virtual environment for 3.10 or later: 
 1. open the Command Palette (Ctrl+Shift+P).
 1. Search for Python: Create Environment.
 1. select Venv / Conda and choose where to create the new environment.
 1. Select the Python interpreter version. Create with version 3.10 or later.

For a dependency installation, run the code below to install the packages required to run it. 

```bash
# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

pip install -r requirements.txt
```

## Set up your environment
Git clone the repository to your local machine. 

```bash
git clone https://github.com/hyogrin/Azure_OpenAI_samples.git
```

Create an .env file based on the .env-sample file. Copy the new .env file to the folder containing your notebook and update the variables.

## 🔨 Current Support and Limitations (as of 2025-01-06) 
- In order to use the image processing, please understand that you must call the o1 using API at this moment.
- GA model version is 2024-12-17. Please check the deployed model version before calling
- Model o1 is enabled only for api versions **2024-12-01-preview** and later. Please set the api version currently. 
- Check the region o1 supported - https://learn.microsoft.com/en-us/azure/cognitive-services/openai/reference#o1-modelsopen ai rest
- Check the API version o1 supported - https://learn.microsoft.com/en-us/azure/ai-services/openai/reference-preview

In [1]:
%load_ext autoreload
%autoreload 2

import os, sys
module_path = "../util"
sys.path.append(os.path.abspath(module_path))

from common import check_kernel
check_kernel()

Kernel: python3110jvsc74a57bd04587453e809f8750004684e23b1ce9228c897ee509f3fc0227b41ec9a0169546


## 🧪 o1 multimodal with image url 

In [None]:
import os
import json
import openai
import base64
from openai import AzureOpenAI
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential, EnvironmentCredential
from azure.keyvault.secrets import SecretClient
from io import BytesIO
import gradio as gr
load_dotenv(override=True)

azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_key = os.getenv("AZURE_OPENAI_KEY", "") if len(os.getenv("AZURE_OPENAI_KEY", "")) > 0 else None
azure_openai_deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-ada-002")
aoai_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "") if len(os.getenv("AZURE_OPENAI_API_VERSION", "")) > 0 else None


# Use DefaultAzureCredential or InteractiveBrowserCredential to authenticate with Azure Key Vault
# tenant_id = os.getenv("AZURE_TENANT_ID")
# credential = DefaultAzureCredential()
# credential = InteractiveBrowserCredential(tenant_id=tenant_id)

# key_vault_url = os.getenv("AZURE_KEY_VAULT_URL")
# key_vault_secret_name = os.getenv("AZURE_KEY_VAULT_SECRET_NAME")

# kv_client = SecretClient(vault_url=key_vault_url, credential=credential)

# # Retrieve the OpenAI API key from Key Vault
# azure_openai_key = kv_client.get_secret(key_vault_secret_name).value

# Initialize the AzureOpenAI client with the retrieved key
try:
    client = AzureOpenAI(
        azure_endpoint=azure_openai_endpoint,
        api_key=azure_openai_key,
        api_version=aoai_api_version
    )
except (ValueError, TypeError) as e:
    print(e)

In [12]:
response = client.chat.completions.create(
    model=azure_openai_deployment_name,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant to analyse images.",
        },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this picture:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": "https://devblogs.microsoft.com/semantic-kernel/wp-content/uploads/sites/78/2023/09/semantic-kernel-in-prompt-flow-1.png"
                }
            }
        ] } 
    ],
    
)
print(response)
print(response.choices[0].message.content)
print("Usage Information:")
print(f"Cached Tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Completion Tokens: {response.usage.completion_tokens}")
print(f"Prompt Tokens: {response.usage.prompt_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")

ChatCompletion(id='chatcmpl-Ax3C4GE23qJN7hIrSgQHRfkOx0EF1', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='It is a simple vertical flowchart. At the top is a rounded shape labeled “Benchmark data,” with an arrow pointing downward into a large, light-gray box titled “Prompt flow.” Inside that box, there are three elements arranged in a vertical sequence: a small rectangle labeled “Input,” a large magenta oval labeled “Semantic Kernel,” and another small rectangle labeled “Output.” Each element is connected by downward arrows. Finally, another arrow leads from the bottom of the “Prompt flow” box to a rounded shape labeled “Results and metrics.”', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'sel

## 🧪 o1 multimodal with local image (base64 encoding)

In [5]:
import base64
from mimetypes import guess_type

# Function to encode a local image into data URL 
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"

# Example usage
image_path = './images/semantic-kernel-in-prompt-flow-1.png'
data_url = local_image_to_data_url(image_path)
#print("Data URL:", data_url)

In [None]:
response = client.chat.completions.create(
    model=azure_openai_deployment_name,
    messages=[
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this picture:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": data_url 
                }
            }
        ] } 
    ],
    
)
print(response)
print(response.choices[0].message.content)
print("Usage Information:")
print(f"Cached Tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Completion Tokens: {response.usage.completion_tokens}")
print(f"Prompt Tokens: {response.usage.prompt_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")

## 🧪 o1 structured output 
Structured outputs make a model follow a JSON Schema definition that you provide as part of your inference API call. This is in contrast to the older JSON mode feature, which guaranteed valid JSON would be generated, but was unable to ensure strict adherence to the supplied schema. Structured outputs is recommended for function calling, extracting structured data, and building complex multi-step workflows.

### Supported models (as of 2025-02-04) 
- o1 version: 2024-12-17
- gpt-4o-mini version: 2024-07-18
- gpt-4o version: 2024-08-06

In [None]:
from pydantic import BaseModel


class GetInformations(BaseModel):
    weather_location: str
    country_location: str
    country_language: str


tools = [openai.pydantic_function_tool(GetInformations)]

response = client.chat.completions.create(
    model=azure_openai_deployment_name,
    messages=[
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Hello, what is the current weather for the capital of South Korea?"
            }
        ] } 
    ],
    tools=tools
)
print(response)
print(response.choices[0].message.tool_calls[0].function)
print(response.choices[0].message.tool_calls[0].function.arguments)
print("Usage Information:")
print(f"Cached Tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Completion Tokens: {response.usage.completion_tokens}")
print(f"Prompt Tokens: {response.usage.prompt_tokens}")
print(f"Total Tokens: {response.usage.total_tokens}")

In [None]:
print(response.model_dump_json(indent=3))

## 🧪 o1 streaming output 


### Supported models (as of 2025-02-04) 
- o3-mini version: 2025-01-31

In [14]:
from pydantic import BaseModel


class GetInformations(BaseModel):
    weather_location: str
    country_location: str
    country_language: str


#tools = [openai.pydantic_function_tool(GetInformations)]

stream = client.chat.completions.create(
    model=azure_openai_deployment_name,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant to analyse images.",
        },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Hello, what is the current weather for the capital of South Korea?"
            }
        ] } 
    ],
    stream=True,
    #tools=tools
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")


BadRequestError: Error code: 400 - {'error': {'message': "Unsupported value: 'stream' does not support true with this model. Supported values are: false.", 'type': 'invalid_request_error', 'param': 'stream', 'code': 'unsupported_value'}}

## 🚀 o1 playground sample  
![o1 playground sample](images/o1-playground-sample.png)

In [6]:
import base64

def o1_webapp_gradio(pil_image, prompt):
    """
    o1 model integration for analyzing images.
    """
    # Save image to an in-memory buffer
    buffer = BytesIO()
    pil_image.save(buffer, format="JPEG")
    buffer.seek(0)

    # Encode the image to base64
    image_data = base64.b64encode(buffer.read()).decode('utf-8')
    image_url = f"data:image/jpeg;base64,{image_data}"

    # Construct the messages
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                { 
                    "type": "image_url",
                    "image_url": {
                        "url": image_url 
                    }
                }
            ],
        },
    ]

    # Call the Azure OpenAI API
    response = client.chat.completions.create(
        model=azure_openai_deployment_name,
        messages=messages
    )

    return response.choices[0].message.content

In [None]:
image_url = "https://cdn-dynmedia-1.microsoft.com/is/image/microsoftcorp/UHFbanner-MSlogo?fmt=png-alpha&bfc=off&qlt=100,1"
logo = "<center> <img src= {} width=100px></center>".format(image_url)
title = "o1 model Playground"

inputs = [
    gr.Image(type="pil", label="Your image"),
    gr.Text(label="Your prompt", value="Describe this image:"),
]
outputs = [
    gr.Text(label=f"{azure_openai_deployment_name} results"),
]
theme = "gradio/monochrome"  # https://huggingface.co/spaces/gradio/theme-gallery

gp4o_webapp = gr.Interface(
    fn=o1_webapp_gradio,
    inputs=inputs,
    outputs=outputs,
    description=logo,
    title=title,
    theme=theme,
)

gp4o_webapp.launch(share=True)