***
# <font>OCI AI Vision </font>
<p style="margin-left:10%; margin-right:10%;"> <font color=teal>  </font></p>

***

## Overview:

Oracle Cloud Infrastructure Vision is a serverless, multi-tenant service, accessible using the Console, REST APIs, SDK, or CLI.

You can upload images to detect and classify objects in them. If you have lots of images, you can process them in batch using asynchronous API endpoints. Vision's features are thematically split between Document AI for document-centric images, and Image Analysis for object and scene-based images. Pretrained models and custom models are supported.

In this notebook, we will walk you through the syntaxes to invoke and use different pre-trained models through AIServiceVisionClient. This notebook aims to provide overall clarity about the features to the user in terms of requirements, usage and the output of the API.   

Please select the  published conda envionment data-science-gmlv1_0_v1 before proceeding further. 

In [2]:
import io
import warnings
import logging
import os
from os import path 
from os.path import expanduser
from os.path import join

import oci

import ads 
ads.set_auth(auth='resource_principal') 

warnings.filterwarnings('ignore')
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

In [3]:
# ADS version used in this notebook: 
print(ads.__version__)

2.8.11


###  Create a default config using DEFAULT profile in default location
Refer to
https://docs.cloud.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm#SDK_and_CLI_Configuration_File
 for more info

In [4]:
# config = oci.config.from_file()

In [5]:
# I am currently using a directory to pass config on runtime, though this is not a safe approach. Kinldy change this to config being called from a file (See cell above). 
# I didn't have necessary access permissions to create a ~/.oci/config file and write it.
config = {
    "user": "ocid1.user.oc1..aaaaaaaajp44knbcz4bwojkczqtiqigjqc3zvekl3ovzwgquarkdbqog5xxa",
    "key_file": "/home/datascience/api-priv-10-31.pem",
    "fingerprint": "c6:cf:a1:76:67:91:b6:38:9f:4b:97:41:42:8d:a5:17",
    "tenancy": "ocid1.tenancy.oc1..aaaaaaaawprhiv4vxq5vlj7f27kyidr675jdepem47s7mzfe6tfhxbpo74xq",
    "region": "us-phoenix-1"
}

from oci.config import validate_config
validate_config(config)

### Initialize service client with default config file

In [6]:
ai_vision_client = oci.ai_vision.AIServiceVisionClient(config)

## Vision provides pretrained document AI models that allow you to organize and extract text and structure from business documents.

### Create AI service vision client and get response object using Document

### Using DocumentClassificationFeature

Vision provides a list of possible document types for the analyzed document. Each document type has a confidence score. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while lower scores indicate lower confidence score. The range of the confidence score for each label is between 0-1. Some of the possible document types are:
* Invoice
* Receipt
* Resume or CV
* Tax form
* Driver's license
* Passport
* Bank statement
* Check
* Payslip

In [7]:
analyze_document_response = ai_vision_client.analyze_document(
    analyze_document_details=oci.ai_vision.models.AnalyzeDocumentDetails(
        features=[
            oci.ai_vision.models.DocumentClassificationFeature(
                feature_type="DOCUMENT_CLASSIFICATION")
        ],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        document=oci.ai_vision.models.ObjectStorageDocumentDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r2.pdf")))

# Get the data from response
print(analyze_document_response.data)

{
  "detected_document_types": [
    {
      "confidence": 0.9999981,
      "document_type": "RECEIPT"
    }
  ],
  "detected_languages": null,
  "document_classification_model_version": "1.5.114",
  "document_metadata": {
    "mime_type": "application/pdf",
    "page_count": 1
  },
  "errors": null,
  "key_value_detection_model_version": null,
  "language_classification_model_version": null,
  "pages": [
    {
      "detected_document_types": [
        {
          "confidence": 0.9999981,
          "document_type": "RECEIPT"
        },
        {
          "confidence": 1.9356146e-06,
          "document_type": "OTHERS"
        },
        {
          "confidence": 6.727421e-12,
          "document_type": "INVOICE"
        },
        {
          "confidence": 5.9690973e-16,
          "document_type": "PASSPORT"
        },
        {
          "confidence": 2.966293e-16,
          "document_type": "RESUME"
        }
      ],
      "detected_languages": null,
      "dimensions": null,
    

In [8]:
analyze_document_response = ai_vision_client.analyze_document(
    analyze_document_details=oci.ai_vision.models.AnalyzeDocumentDetails(
        features=[
            oci.ai_vision.models.DocumentClassificationFeature(
                feature_type="DOCUMENT_CLASSIFICATION")
        ],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        document=oci.ai_vision.models.ObjectStorageDocumentDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r1.jpg")))

# Get the data from response
print(analyze_document_response.data)

{
  "detected_document_types": [
    {
      "confidence": 0.9999996,
      "document_type": "RECEIPT"
    }
  ],
  "detected_languages": null,
  "document_classification_model_version": "1.5.114",
  "document_metadata": {
    "mime_type": "image/jpeg",
    "page_count": 1
  },
  "errors": null,
  "key_value_detection_model_version": null,
  "language_classification_model_version": null,
  "pages": [
    {
      "detected_document_types": [
        {
          "confidence": 0.9999996,
          "document_type": "RECEIPT"
        },
        {
          "confidence": 3.9858188e-07,
          "document_type": "OTHERS"
        },
        {
          "confidence": 7.239463e-10,
          "document_type": "INVOICE"
        },
        {
          "confidence": 5.1755284e-15,
          "document_type": "TAX_FORM"
        },
        {
          "confidence": 5.7858375e-16,
          "document_type": "RESUME"
        }
      ],
      "detected_languages": null,
      "dimensions": null,
      "d

### Using DocumentTableDetectionFeature

Vision provides the number of rows and columns for the table and the contents in each table cell. Each cell has a confidence score. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while lower scores indicate lower confidence score. The range of the confidence score for each label is from 0 to 1.

In [9]:
analyze_document_response = ai_vision_client.analyze_document(
    analyze_document_details=oci.ai_vision.models.AnalyzeDocumentDetails(
        features=[
            oci.ai_vision.models.DocumentTableDetectionFeature(
                feature_type="TABLE_DETECTION")
        ],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        document=oci.ai_vision.models.ObjectStorageDocumentDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r4.png")))

# Get the data from response
print(analyze_document_response.data)

{
  "detected_document_types": null,
  "detected_languages": null,
  "document_classification_model_version": null,
  "document_metadata": {
    "mime_type": "image/png",
    "page_count": 1
  },
  "errors": null,
  "key_value_detection_model_version": null,
  "language_classification_model_version": null,
  "pages": [
    {
      "detected_document_types": null,
      "detected_languages": null,
      "dimensions": {
        "height": 745.0,
        "unit": "PIXEL",
        "width": 875.0
      },
      "document_fields": null,
      "lines": [
        {
          "bounding_polygon": {
            "normalized_vertices": [
              {
                "x": 0.016,
                "y": 0.04563758389261745
              },
              {
                "x": 0.23885714285714285,
                "y": 0.04563758389261745
              },
              {
                "x": 0.23885714285714285,
                "y": 0.08456375838926175
              },
              {
                "x"

### Using DocumentKeyValueDetectionFeature

Key value extraction can be used to identify values for predefined keys in a receipt. For example, if a receipt includes a merchant name, merchant address, or merchant phone number, Vision can identify these values and return them as a key value pair.

In [10]:
analyze_document_response = ai_vision_client.analyze_document(
    analyze_document_details=oci.ai_vision.models.AnalyzeDocumentDetails(
        features=[
            oci.ai_vision.models.DocumentKeyValueDetectionFeature(
                feature_type="KEY_VALUE_DETECTION")
        ],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        document=oci.ai_vision.models.ObjectStorageDocumentDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r2.pdf")))

# Get the data from response
print(analyze_document_response.data)

{
  "detected_document_types": null,
  "detected_languages": null,
  "document_classification_model_version": null,
  "document_metadata": {
    "mime_type": "application/pdf",
    "page_count": 1
  },
  "errors": null,
  "key_value_detection_model_version": "1.7.174",
  "language_classification_model_version": null,
  "pages": [
    {
      "detected_document_types": null,
      "detected_languages": null,
      "dimensions": {
        "height": 8.333333333333334,
        "unit": "INCH",
        "width": 5.013888888888889
      },
      "document_fields": [
        {
          "field_label": {
            "confidence": 0.9999999,
            "name": "MerchantName"
          },
          "field_name": null,
          "field_type": "KEY_VALUE",
          "field_value": {
            "bounding_polygon": {
              "normalized_vertices": [
                {
                  "x": 0.06679959970828238,
                  "y": 0.01019796073436737
                },
                {
    

## Vision provides pretrained image analysis AI models that allow you to locate and tag objects, text, and entire scenes in images.

### Create AI service vision client and get response object using Image

Supports JPG & PNG.

### Using ImageTextDetectionFeature

Vision can detect and recognize text in a document.

Language classification identifies the language of a document, then OCR draws bounding boxes around the printed or hand-written text it locates in an image, and digitizes the text. For example, if you have an image of a stop sign, Vision locates the text in that image and extracts the text STOP. It provides bounding boxes for the identified text.

Vision provides a confidence score for each text grouping. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the extracted text, while lower scores indicate lower confidence score. The range of the confidence score for each label is from 0 to 1.

Text Detection can be used with Document AI or Image Analysis models.

OCR support is limited to English.

In [11]:
analyze_image_response = ai_vision_client.analyze_image(
    analyze_image_details=oci.ai_vision.models.AnalyzeImageDetails(
        features=[
            oci.ai_vision.models.ImageTextDetectionFeature(
                feature_type="TEXT_DETECTION",
                language="ENG")],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        image=oci.ai_vision.models.ObjectStorageImageDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r7.jpg")))

# Get the data from response
print(analyze_image_response.data)

{
  "detected_faces": null,
  "errors": [],
  "face_detection_model_version": null,
  "image_classification_model_version": null,
  "image_objects": null,
  "image_text": {
    "lines": [
      {
        "bounding_polygon": {
          "normalized_vertices": [
            {
              "x": 0.18181818181818182,
              "y": 0.41530054644808745
            },
            {
              "x": 0.7709090909090909,
              "y": 0.3005464480874317
            },
            {
              "x": 0.7854545454545454,
              "y": 0.5081967213114754
            },
            {
              "x": 0.19636363636363635,
              "y": 0.6229508196721312
            }
          ]
        },
        "confidence": 0.99717045,
        "text": "ORACLE",
        "word_indexes": [
          0
        ]
      }
    ],
    "words": [
      {
        "bounding_polygon": {
          "normalized_vertices": [
            {
              "x": 0.18181818181818182,
              "y": 0.4153

### Using ImageClassificationFeature

Image classification can be used to identify scene-based features and objects in an image. You can have one classification or many classifications, depending on the use case and the number of items in an image. For example, if you have an image of a person running, Vision identifies the person, the clothing, and the footwear.

Vision provides a confidence score for each label. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the label, while lower scores indicate lower confidence score. The range of the confidence score for each label is from 0 to 1.

In [12]:
analyze_image_response = ai_vision_client.analyze_image(
    analyze_image_details=oci.ai_vision.models.AnalyzeImageDetails(
        features=[
            oci.ai_vision.models.ImageClassificationFeature(
                feature_type="IMAGE_CLASSIFICATION"
                )],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        image=oci.ai_vision.models.ObjectStorageImageDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r10.jpg")))

# Get the data from response
print(analyze_image_response.data)

{
  "detected_faces": null,
  "errors": [],
  "face_detection_model_version": null,
  "image_classification_model_version": "1.5.97",
  "image_objects": null,
  "image_text": null,
  "labels": [
    {
      "confidence": 0.9925405,
      "name": "Transmission tower"
    },
    {
      "confidence": 0.992527,
      "name": "Overhead power line"
    },
    {
      "confidence": 0.98857516,
      "name": "Tower"
    },
    {
      "confidence": 0.9665545,
      "name": "Antenna"
    },
    {
      "confidence": 0.8804998,
      "name": "Vegetation"
    }
  ],
  "object_detection_model_version": null,
  "ontology_classes": [
    {
      "name": "Overhead power line",
      "parent_names": [],
      "synonym_names": []
    },
    {
      "name": "Vegetation",
      "parent_names": [
        "Plant"
      ],
      "synonym_names": []
    },
    {
      "name": "Transmission tower",
      "parent_names": [],
      "synonym_names": []
    },
    {
      "name": "Tower",
      "parent_names": [

### Using ImageObjectDetectionFeature

Object detection is used to locate and identity objects within an image. Vision provides a confidence score for each object identified. The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the objects classification, while lower scores indicate a lower confidence score. The range of the confidence score for each label is from 0 to 1.

In [13]:
analyze_image_response = ai_vision_client.analyze_image(
    analyze_image_details=oci.ai_vision.models.AnalyzeImageDetails(
        features=[
            oci.ai_vision.models.ImageObjectDetectionFeature(
                feature_type="OBJECT_DETECTION"
                )],
        compartment_id="ocid1.compartment.oc1..aaaaaaaabu6pgqbwe4are4ke7uzkq44rbvbnxwhybhmplialatq54kdvq4jq",
        image=oci.ai_vision.models.ObjectStorageImageDetails(
            source="OBJECT_STORAGE",
            namespace_name="ocuocictrng22",
            bucket_name="DSP-VISION",
            object_name="r9.jpg")))

# Get the data from response
print(analyze_image_response.data)

{
  "detected_faces": null,
  "errors": [],
  "face_detection_model_version": null,
  "image_classification_model_version": null,
  "image_objects": [
    {
      "bounding_polygon": {
        "normalized_vertices": [
          {
            "x": 0.01935483870967742,
            "y": 0.15950920245398773
          },
          {
            "x": 0.9903225806451613,
            "y": 0.15950920245398773
          },
          {
            "x": 0.9903225806451613,
            "y": 0.8282208588957055
          },
          {
            "x": 0.01935483870967742,
            "y": 0.8282208588957055
          }
        ]
      },
      "confidence": 0.83714765,
      "name": "Car"
    },
    {
      "bounding_polygon": {
        "normalized_vertices": [
          {
            "x": 0.05161290322580645,
            "y": 0.32515337423312884
          },
          {
            "x": 0.2032258064516129,
            "y": 0.32515337423312884
          },
          {
            "x": 0.203225806451

Vision also provides custom image analysis models that allow you to locate and tag objects, text, and entire scenes in images, that are specific to your scenario. Simply create a labeled dataset, instruct Vision to train a model using the labeled dataset, and call the custom model to evaluate new images.
Refer to
https://docs.oracle.com/en-us/iaas/vision/vision/using/custom_image_analysis_models.htm