# Extract image content with Azure AI Chat-GPT4o


## Requirements

* [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service)
  * GPT-4o
* Python environment, version 3.10 or higher
* Visual Studio Code
  * Extensions: Python and Jupyper

In [None]:
# Python packages
# ! pip install -r requirements.txt

In [1]:
# Libraries
import os
from dotenv import load_dotenv

# Import Utility Functions
from utils import (
    word_wrap,
    local_image_to_data_url    
)

# OpenAI Python libraries
from openai import AzureOpenAI

In [None]:
#load variables
load_dotenv()

# Variables - Azure Services
AZURE_OPENAI_ACCOUNT=os.environ["AZURE_OPENAI_ACCOUNT"]
AZURE_OPENAI_KEY=os.environ["AZURE_OPENAI_KEY"]

# Variables - Names
azure_openai_gpt4o_name="gpt-4o"

## Azure OpenAI Call


### AzureOpenAI call using a local image

In [4]:
# Create an OpenAI client object
# Python AzureOpenAI Class: https://github.com/openai/openai-python?tab=readme-ov-file#microsoft-azure-openai
openai_client=AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,     
     api_key=AZURE_OPENAI_KEY
)

### AzureOpenAI call using a a Local Image

1. First convert the image file to base64 so it can be passed to the API
1. Send the base64 file to Azure OpenAI API using the image_url field. 

In [None]:
# Add and test local image 1
image_path='../images-lab-tests/seattle-pikeplace-1.jpg'
data_url=local_image_to_data_url(image_path)
#print("Data URL:", data_url)

In [None]:
response=openai_client.chat.completions.create(
    model=azure_openai_gpt4o_name,
    messages=[
        { 
            "role": "system", 
            "content": "You are a Visual Cognitive system tasked with extracting text and information from images." 
        },
        { 
            "role": "user", 
            "content": [  
                { 
                    "type": "text", 
                    "text": "Extract all the following data in different sections from the provided \
                        image and compile it into a paragraph: Brief description, Entities, and Text in the image."
                    
                },
                { 
                    "type": "image_url",
                    "image_url": {
                        "url": data_url                        
                    }
                }
            ] 
        } 
    ],
    #  Set a "max_tokens" value, or the return output will be cut off
    max_tokens=3000 
)

print(f"Azure OpenAI message content only:\n{word_wrap(response.choices[0].message.content)}")

Azure OpenAI message content only:
**Brief Description:**  
The image showcases the iconic Pike Place Market in Seattle, Washington, a
popular tourist destination known for its vibrant atmosphere, local produce, and unique shops. The
scene features the famous "Public Market Center" neon sign with its clock, bustling market
entrances, and wet brick-paved streets, indicating rainy weather typical of the
region.

**Entities:**  
- Pike Place Market  
- "Public Market Center" sign  
- "Meet the
Producer" sign  
- "Farmers Market" entrance  
- Seattle, Washington  
- Clock on the "Public
Market Center" sign  
- Cars parked in front of the market  
- Individuals walking near the market 


**Text in the Image:**  
- "PUBLIC MARKET CENTER"  
- "FARMERS MARKET"  
- "MEET THE PRODUCER" 

- "Seattle, Washington"  

**Compiled Paragraph:**  
The image depicts Pike Place Market, a
hallmark of Seattle, Washington, characterized by its vibrant surroundings and community-focused
atmosphere. At the hea

### AzureOpenAI call using a URL

In [8]:
# Add the image URL
url = 'https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png'

In [9]:
#query="Extract all the text from the provided image and compile it into a paragraph."

response=openai_client.chat.completions.create(
    model=azure_openai_gpt4o_name,
    messages=[
        { 
            "role": "system", 
            "content": "You are a Visual Cognitive system tasked with extracting text and information from images." 
        },
        { 
            "role": "user", 
            "content": [  
                { 
                    "type": "text", 
                    "text": "Using the image provided by image_url, get the following information: \
                        Brief description, Tags, and Text in the image."
                    
                },
                { 
                    "type": "image_url",
                    "image_url": {
                        "url": url                        
                    }
                }
            ] 
        } 
    ],
    #  Set a "max_tokens" value, or the return output will be cut off
    max_tokens=3000 
)

print(f"Azure OpenAI message content only:\n{word_wrap(response.choices[0].message.content)}")

Azure OpenAI message content only:
**Brief description:**  
The image shows a modern office environment with a large interactive
display screen. The screen depicts a scheduling or meeting interface, and a person in a yellow
sweater uses touch gestures to interact with it. The background includes plants, office furniture,
and a whiteboard.

**Tags:**  
Interactive display, office technology, touchscreen, meeting
interface, scheduling, desk, modern workspace, plants, presentation.

**Text in the image:**  
-
**9:35 AM**  
  Conference room | 56498554  
  555-123-4957  

- **Town Hall**  
  9:00 AM - 10:00
AM  
  Aaron Buxton  
  *Join*  

- **Daily SCRUM**  
  10:00 AM - 11:00 AM  
  Charlotte De Courn 


- **Quarterly All Hands**  
  11:00 AM - 12:00 PM  
  Sohail Sharma  

- **Weekly Stand-up**  
 
12:00 PM - 12:30 PM  
  Danielle Manara  

- **Product review**  
