# <b>KeyValue Extraction</b>

The AIServiceDocumentClient offers the feature to create a custom <b>KeyValue extraction</b> model by giving a training dataset. This notebook aims to provide clarity on how to call the trained model. <br>
<ul>
    <li>The raw output is saved as <code>response_document.json</code> in the <code>output</code> directory. </li>
</ul>

### Steps to run the notebook:
<details>
    <summary>Notebook session setup</summary>
    <ol>
        <li><font size="2">Installing the OCI SDK</font></li>
        <li><font size="2">Installing other dependencies</font></li>
        <li><font size="2">Setup sample input images</font></li>
        <li><font size="2">Create output folder</font></li>
        <li><font size="2">Setup helper .py files</font></li>
    </ol>
</details>

<details>
    <summary>Importing the required modules</summary>
</details>

<details>
    <summary>Setting the input variables</summary>
     <font size="2">The user can give input variables of their choice or can use the sample input given.</font>
</details>

<details>
    <summary>Running the main pipeline</summary>
    <font size="2">Run all cells to get the output in the <code>output</code> directory. </font><br>
</details>

### Notebook session setup
<details>
    <summary>Instructions</summary>
    <ul>
        <li><font size="2">The user needs to setup only once.</font></li>
        <li><font size="2">Uncomment the commented cells and run once to setup.</font></li>
        <li><font size="2">Comment back the same cells to avoid running again.</font></li>
    </ul>
</details>

#### Installing the OCI Python SDK

In [None]:
# !pip3 install oci-cli
# !pip3 install --trusted-host=artifactory.oci.oraclecorp.com -i https://artifactory.oci.oraclecorp.com/api/pypi/global-dev-pypi/simple -U oci==2.88.2+preview.1.5970

#### Installing other dependencies

In [None]:
# !pip install matplotlib==3.3.4
# !pip install pandas==1.1.5

#### Setup sample input images

In [None]:
# !wget "https://objectstorage.us-ashburn-1.oraclecloud.com/n/axhheqi2ofpb/b/document_demo_notebooks/o/Ladingbill.png"
# !mkdir data
# !mv Ladingbill.png data

### Setup Path to store output in JSON format

In [None]:
# !mkdir output

#### Setup helper .py files

In [None]:
# !wget "https://objectstorage.us-ashburn-1.oraclecloud.com/n/axhheqi2ofpb/b/document_demo_notebooks/o/analyze_document_utils.py"
# !mkdir helper
# !mv analyze_document_utils.py helper

### Imports

In [None]:
import base64
import uuid
import io
import json
from PIL import Image
import matplotlib.pyplot as plt
import requests
import oci
from helper.analyze_document_utils import is_url, clean_output, display_classes, create_processor_job_callback

### Set input variables
<details>
<summary><font size="3">input_path</font></summary>
<font size="2">The image URL or filepath from the notebook session.</font><br>
</details>
<details>
<summary><font size="3">compartment_id</font></summary>
<font size="2">The OCID of the compartment where the model is created. </font><br>
</details>
<details>
<summary><font size="3">namespace_name</font></summary>
<font size="2">The namespace name where you are working in OCI console. </font><br>
</details>
<details>
<summary><font size="3">bucket_name</font></summary>
<font size="2">The name of the bucket that is created in Lab1.  </font><br>
</details>
<details>
<summary><font size="3">model_id</font></summary>
<font size="2">The OCID of the model created in Lab2. This can be found in the model details. </font><br>
</details>

In [None]:
input_path = "data/Ladingbill.png"
compartment_id = "<compartment ID>" 
namespace_name = "<namespace>" 
bucket_name = "<bucketname>"
model_id = "<model ID>"

### Authorize user config

In [None]:
config = oci.config.from_file('~/.oci/config')

### View input image

In [None]:
if is_url(input_path):
    encoded_string = base64.b64encode(requests.get(input_path).content)
else:
    with open(input_path, "rb") as document_file:
        encoded_string = base64.b64encode(document_file.read())

image_data = base64.b64decode(encoded_string)
image = Image.open(io.BytesIO(image_data))
plt.gcf().set_dpi(200)
plt.axis('off')
plt.imshow(image)

### Create AI service document client

In [None]:
ai_service_document_client = oci.ai_document.AIServiceDocumentClientCompositeOperations(oci.ai_document.AIServiceDocumentClient(config=config))
key_value_extraction_feature = oci.ai_document.models.DocumentKeyValueExtractionFeature()
key_value_extraction_feature.model_id = model_id

#### Create output folder

In [None]:
output_location = oci.ai_document.models.OutputLocation()
output_location.namespace_name = namespace_name
output_location.bucket_name = bucket_name
output_location.prefix = "prefix"

### Create Object for Processor Job Details

In [None]:
create_processor_job_details_key_value_extraction = oci.ai_document.models.CreateProcessorJobDetails(
                                                    display_name=str(uuid.uuid4()),
                                                    compartment_id=compartment_id,
                                                    input_location=oci.ai_document.models.InlineDocumentContent(data=encoded_string.decode('utf-8')),
                                                    output_location=output_location,
                                                    processor_config=oci.ai_document.models.GeneralProcessorConfig(features=[key_value_extraction_feature]))


### Create the Processor Job
The Processor Job is created and we wait for it to get completed successfully. It should finally move to <code>SUCCEEDED</code> state.

In [None]:
create_processor_response = ai_service_document_client.create_processor_job_and_wait_for_state(
    create_processor_job_details=create_processor_job_details_key_value_extraction,
    wait_for_states=[oci.ai_document.models.ProcessorJob.LIFECYCLE_STATE_SUCCEEDED],
    waiter_kwargs={"wait_callback": create_processor_job_callback})
print("Processor call is in {} state with request_id: {}.\n".format(create_processor_response.data.lifecycle_state, create_processor_response.request_id))

### Processor Job response

In [None]:
processor_job: oci.ai_document.models.ProcessorJob = create_processor_response.data
print(create_processor_response.data)

### Getting the output JSON file from Object Storage
The Job output is stored in the output location specified by the user. We retrieve it using object storage client.

In [None]:
object_storage_client = oci.object_storage.ObjectStorageClient(config=config)
get_object_response = object_storage_client.get_object(namespace_name=output_location.namespace_name,
                                                       bucket_name=output_location.bucket_name,
                                                       object_name="{}/{}/_/results/defaultObject.json".format(
                                                           output_location.prefix, processor_job.id))

### Clean and save the API response as json

In [None]:
res_json = json.loads(str(get_object_response.data.content.decode('utf-8')))
clean_res = clean_output(res_json)
with open('output/response_document.json', 'w') as fp:
    json.dump(clean_res, fp)

### Display the classes with their confidence levels

In [None]:
display_classes(clean_res)