Azure Computer Vision: Image Analysis 4.0

[Quickstart: Image Analysis 4.0](https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/quickstarts-sdk/image-analysis-client-library-40?tabs=visual-studio%2Cwindows&pivots=programming-language-python)

## Requirements

* [Azure Computer Vision](https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision)
* Python environment, version 3.10 or higher
* Visual Studio Code
  * Extensions: Python and Jupyper

In [None]:
# Python packages
#! pip install -r requirements.txt

In [3]:
# Libraries
import os
from dotenv import load_dotenv

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

In [4]:
#load variables
load_dotenv()

# Azure Python libraries
endpoint = os.environ["AZURE_AI_MULTISERVICE_ENDPOINT"]
subscription_key = os.environ["AZURE_AI_MULTISERVICE_KEY"]

## Analyze image

To authenticate against the Image Analysis service, you need a Computer Vision key and endpoint URL. This guide assumes that you've defined the environment variables VISION_KEY and VISION_ENDPOINT with your key and endpoint.

In [5]:
# Create an Image Analysis client
# Python ImageAnalysisClient Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.imageanalysisclient?view=azure-python
client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(subscription_key)
)

To get the Image Analysis results, the following prerequisites needs to be completed:
* Select the image to analyze
  * You can select an image by providing a publicly accessible image URL, or by [reading image data into the SDK's input buffer](https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/how-to/call-analyze-image-40?pivots=programming-language-python#image-buffer).
* Select visual features
* Optionally configure Select smart cropping aspect ratios, gender_neutral_caption and language when sending the analyzed image.


In [None]:
# Load image to analyze into a 'bytes' object
with open("../images-lab-tests/seattle-pikeplace-1.jpg", "rb") as f:
    image_data = f.read()

In [11]:
# Get a caption for the image. This will be a synchronously (blocking) call.
# Python ImageAnalysisClient.analyze_from_url method: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.imageanalysisclient?view=azure-python#azure-ai-vision-imageanalysis-imageanalysisclient-analyze-from-url
#   Visual Features: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.visualfeatures?view=azure-python
result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ, VisualFeatures.DENSE_CAPTIONS, VisualFeatures.OBJECTS, VisualFeatures.PEOPLE, VisualFeatures.SMART_CROPS, VisualFeatures.TAGS]
    #gender_neutral_caption=True,  # Optional (default is False)
)

# Python ImageAnalysisClient results: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models?view=azure-python
print("Image analysis results:")

# # Python CaptionResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.captionresult?view=azure-python
print("\n Captions:")
if result.caption is not None:
    # The '.4f' format specifier formats the confidence score to have 4 decimal places
    print(f"   '{result.caption.text}', Confidence {result.caption.confidence:.4f}")

# # Python DenseCaption Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.densecaption?view=azure-python#azure-ai-vision-imageanalysis-models-densecaption-values
print("\n Dense captions:")
if result.dense_captions is not None:
    for v in result.dense_captions.values():        
        for densecaption in v:
            print(f"   '{densecaption.text}', Confidence {densecaption.confidence:.4f}, {densecaption.bounding_box}")

# # Print text (OCR) analysis results to the console
# # Python ReadResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.readresult?view=azure-python
print("\n Read-OCR:")
if result.read is not None:
    # Python DetectedTextBlock Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextblock?view=azure-python
    # Python DetectedTextLine Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextline?view=azure-python
    # Python DetectedTextWord Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextword?view=azure-python
    for line in result.read.blocks[0].lines:
        print(f"   Line: '{line.text}', Bounding box {line.bounding_polygon}")
        for word in line.words:
            print(f"     Word: '{word.text}', Bounding polygon {word.bounding_polygon}, Confidence {word.confidence:.4f}")

# # Python ObjectsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.objectsresult?view=azure-python#azure-ai-vision-imageanalysis-models-objectsresult-get
# # Python DetectedObject Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedobject?view=azure-python
print("\n Objects:") 
if result.objects is not None:
    for i, detectedobjects in enumerate(result.objects.get("values")):
        print(f"  Object {i + 1}:")
        for k,v in detectedobjects.items():
            print(f"   '{k}: {v}'")            

# # Python PeopleResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.peopleresult?view=azure-python
# # Python DetectedPerson Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedperson?view=azure-python
print("\n People:")
if result.people is not None:
    for i,detectedperson in enumerate(result.people.get("values")):
        print(f"  Person {i + 1}:")
        for k,v in detectedperson.items():
            print(f"   '{k}: {v}'")

# # Python SmartCropsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.smartcropsresult?view=azure-python
# # Python CropRegion Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.cropregion?view=azure-python
print("\n Smart_crops:")
if result.smart_crops is not None:        
    for smart_crops in result.smart_crops.get("values"):
        for k,v in smart_crops.items():
            print(f"   '{k}: {v}'")

# # Python TagsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.tagsresult?view=azure-python
# # Python DetectedTag Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtag?view=azure-python
print("\n Tags:")
if result.tags is not None:    
    for i, detectedtags in enumerate(result.tags.get("values")):
        print(f"  Tag {i + 1}:")
        for k,v in detectedtags.items():
            print(f"   '{k}: {v}'")

Image analysis results:

 Captions:
   'a group of cars parked outside of a market with Pike Place Market in the background', Confidence 0.6851

 Dense captions:
   'a group of cars parked outside of a market', Confidence 0.6851, {'x': 0, 'y': 0, 'w': 700, 'h': 500}
   'a clock with red hands', Confidence 0.7682, {'x': 278, 'y': 105, 'w': 75, 'h': 74}
   'a close up of a car', Confidence 0.8497, {'x': 112, 'y': 315, 'w': 197, 'h': 65}
   'a car parked in a parking lot', Confidence 0.7255, {'x': 0, 'y': 331, 'w': 275, 'h': 95}
   'a car parked on a brick road', Confidence 0.7291, {'x': 19, 'y': 342, 'w': 648, 'h': 152}
   'a sign with a clock on it', Confidence 0.7288, {'x': 68, 'y': 47, 'w': 281, 'h': 185}

 Read-OCR:
   Line: 'PUBLIC', Bounding box [{'x': 126, 'y': 59}, {'x': 313, 'y': 59}, {'x': 314, 'y': 106}, {'x': 127, 'y': 111}]
     Word: 'PUBLIC', Bounding polygon [{'x': 126, 'y': 60}, {'x': 304, 'y': 62}, {'x': 302, 'y': 102}, {'x': 128, 'y': 112}], Confidence 0.8800
   Line: 

In [None]:
# Get a caption for the image. This will be a synchronously (blocking) call.
# Python ImageAnalysisClient.analyze_from_url method: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.imageanalysisclient?view=azure-python#azure-ai-vision-imageanalysis-imageanalysisclient-analyze-from-url
#   Visual Features: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.visualfeatures?view=azure-python
result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ, VisualFeatures.DENSE_CAPTIONS, VisualFeatures.OBJECTS, VisualFeatures.PEOPLE, VisualFeatures.SMART_CROPS, VisualFeatures.TAGS]
    #gender_neutral_caption=True,  # Optional (default is False)
)

# Python ImageAnalysisClient results: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models?view=azure-python
print("Image analysis results:")

# # Python CaptionResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.captionresult?view=azure-python
print("\n Captions:")
if result.caption is not None:
    # The '.4f' format specifier formats the confidence score to have 4 decimal places
    print(f"   '{result.caption.text}', Confidence {result.caption.confidence:.4f}")

# # Python DenseCaption Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.densecaption?view=azure-python#azure-ai-vision-imageanalysis-models-densecaption-values
print("\n Dense captions:")
if result.dense_captions is not None:
    for v in result.dense_captions.values():        
        for densecaption in v:
            print(f"   '{densecaption.text}', Confidence {densecaption.confidence:.4f}, {densecaption.bounding_box}")

# # Print text (OCR) analysis results to the console
# # Python ReadResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.readresult?view=azure-python
print("\n Read-OCR:")
if result.read is not None:
    # Python DetectedTextBlock Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextblock?view=azure-python
    # Python DetectedTextLine Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextline?view=azure-python
    # Python DetectedTextWord Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtextword?view=azure-python
    for line in result.read.blocks[0].lines:
        print(f"   Line: '{line.text}', Bounding box {line.bounding_polygon}")
        for word in line.words:
            print(f"     Word: '{word.text}', Bounding polygon {word.bounding_polygon}, Confidence {word.confidence:.4f}")

# # Python ObjectsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.objectsresult?view=azure-python#azure-ai-vision-imageanalysis-models-objectsresult-get
# # Python DetectedObject Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedobject?view=azure-python
print("\n Objects:") 
if result.objects is not None:
    for i, detectedobjects in enumerate(result.objects.get("values")):
        print(f"  Object {i + 1}:")
        for k,v in detectedobjects.items():
            print(f"   '{k}: {v}'")            

# # Python PeopleResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.peopleresult?view=azure-python
# # Python DetectedPerson Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedperson?view=azure-python
print("\n People:")
if result.people is not None:
    for i,detectedperson in enumerate(result.people.get("values")):
        print(f"  Person {i + 1}:")
        for k,v in detectedperson.items():
            print(f"   '{k}: {v}'")

# # Python SmartCropsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.smartcropsresult?view=azure-python
# # Python CropRegion Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.cropregion?view=azure-python
print("\n Smart_crops:")
if result.smart_crops is not None:        
    for smart_crops in result.smart_crops.get("values"):
        for k,v in smart_crops.items():
            print(f"   '{k}: {v}'")

# # Python TagsResult Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.tagsresult?view=azure-python
# # Python DetectedTag Class: https://learn.microsoft.com/en-us/python/api/azure-ai-vision-imageanalysis/azure.ai.vision.imageanalysis.models.detectedtag?view=azure-python
print("\n Tags:")
if result.tags is not None:    
    for i, detectedtags in enumerate(result.tags.get("values")):
        print(f"  Tag {i + 1}:")
        for k,v in detectedtags.items():
            print(f"   '{k}: {v}'")