Image Analysis with Azure Computer Vision 4.0 Captioning and Dense Captioning

"Caption" replaces "Describe" in V4.0 as the significantly improved image captioning feature rich with details and sematic understanding.

Dense Captions provides more detail by generating one sentence descriptions of up to 10 regions of the image in addition to describing the whole image.
Dense Captions also returns bounding box coordinates of the described image regions.
There's also a new gender-neutral parameter to allow customers to choose whether to enable probabilistic gender inference for alt-text and Seeing AI applications.
Automatically deliver rich captions, accessible alt-text, SEO optimization, and intelligent photo curation to support digital content.

This service is currently in preview, and the API may change in the future.

https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-describe-images-40?tabs=image

Python demo notebook

Python demo notebook

Results

Dense captions from the image

13-Mar-2023 Serge Retkowsky | serge.retkowsky@microsoft.com | https://www.linkedin.com/in/serger/