### Step 1 - Install all Python Packages

In [None]:
%pip install azure-storage-blob==12.22.0
%pip install azure-identity==1.17.1
%pip install azure-storage-blob==12.22.0
%pip install azure-ai-formrecognizer==3.3.3
%pip install python-dotenv==1.0.1

### Step 2 - Create Azure Client

Now we create all client needed for the training and testing of the model

In [4]:
import os
from dotenv import load_dotenv
from azure.storage.blob import BlobServiceClient
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

load_dotenv()

blob_service_client = BlobServiceClient.from_connection_string(os.getenv('STORAGE_NAME_CNX_STRING'))    

creds = AzureKeyCredential(os.environ["FORM_RECOGNIZER_KEY"])

document_client = DocumentAnalysisClient(endpoint=os.environ["FORM_RECOGNIZER_ENDPOINT"],credential=creds)

### Step 3 - Create a container for training and upload all training assets

Create a container for training and upload all training assets

One the documents are uploaded, we can start the training process. 

For this go to https://documentintelligence.ai.azure.com/studio/

In [None]:
import time

container_name="training"

container_client = blob_service_client.get_container_client(container_name)

if not container_client.exists():
    container_client.create_container()
    container_client = blob_service_client.get_container_client(container_name)
else:
    container_client.delete_container()
    time.sleep(120) # wait for the container to be deleted before creating it again
    blob_service_client.create_container(container_name)
    container_client = blob_service_client.get_container_client(container_name)

directory_path = "../documents/training"

for filename in os.listdir(directory_path):
    file_path = os.path.join(directory_path, filename)
    if os.path.isfile(file_path):
        print(filename)
        with open(file_path, "rb") as data:
            container_client.upload_blob(name=filename, data=data)

print("Now go to Document Intelligence Studio: https://documentintelligence.ai.azure.com/studio/")


### Step 4 - Test the custom model

**BE SURE TO TRAIN THE MODEL BEFORE RUNNING THIS CELL**

Before running this cell train the model using the studio https://documentintelligence.ai.azure.com/studio/ and follow the instruction inside the **README.md** file

In [22]:
file_path = "../documents/test/test1.pdf"
model_name = "ModelV1"

with open(file_path, "rb") as data:
   poller = document_client.begin_analyze_document(model_name,data)

# Wait for the result, this is for demo purpose only, shouldn't be done in a production
# workload.
result = poller.result()

for idx, document in enumerate(result.documents):
    print(f"--------Analyzing document #{idx + 1}--------")
    print(f"Document has type {document.doc_type}")
    print(f"Document has document type confidence {document.confidence}")
    print(f"Document was analyzed with model with ID {result.model_id}")
    for name, field in document.fields.items():
        field_value = field.value if field.value else field.content
        print(
            f"......found label '{name}' for field of type '{field.value_type}' with value '{field_value}' and with confidence {field.confidence}"
        )

--------Analyzing document #1--------
Document has type ModelV1:ModelV1
Document has document type confidence 0.857
Document was analyzed with model with ID ModelV1
......found label 'Objective and Goals' for field of type 'string' with value 'Migrate workflow to cloud native' and with confidence 0.994
......found label 'Name of Candidate' for field of type 'string' with value 'Marco Doe' and with confidence 0.991
......found label 'Position Applied For' for field of type 'string' with value 'Biztalk Integrator' and with confidence 0.995
......found label 'Date of Submission' for field of type 'string' with value '2022-01-01' and with confidence 0.995
......found label 'Contact-Number' for field of type 'string' with value '999-999-9999' and with confidence 0.994
......found label 'Other Information' for field of type 'string' with value 'None' and with confidence 0.919
......found label 'Job Application Id' for field of type 'string' with value '898219022' and with confidence 0.99
...

Now, let's test the same but with the prebuild model, not the custom one

In [21]:

file_path = "../documents/test/test1.pdf"
model_name = "prebuilt-document"

with open(file_path, "rb") as data:
   poller = document_client.begin_analyze_document(model_name,data)

# Wait for the result, this is for demo purpose only, shouldn't be done in a production
# workload.
result = poller.result()


print("----Key-value pairs found in document----")
for kv_pair in result.key_value_pairs:
    if kv_pair.key and kv_pair.value:
        print("Key '{}': Value: '{}'".format(kv_pair.key.content, kv_pair.value.content))
    else:
        print("Key '{}': Value:".format(kv_pair.key.content))

print("----------------------------------------")

----Key-value pairs found in document----
Key 'Job Application Type:': Value: 'Biztalk Integrator'
Key 'Name of Candidate:': Value: 'Marco Doe'
Key 'Job Application Id:': Value: '898219022'
Key 'Date of Submission of Job Application:': Value: '2022-01-01'
Key 'Position Applied for:': Value: 'Biztalk Integrator'
Key 'Contact Number:': Value: '999-999-9999'
Key 'Objectives and Goals:': Value: 'Migrate workflow to cloud native'
Key 'Name of Degree': Value: 'Bachelor in Science'
Key 'Institution': Value: 'Random University'
Key 'Year of Passing
Marks Obtained': Value: '1998
95'
Key 'Name of Company': Value: 'Super Company'
Key 'Job Position Held': Value: 'Biztalk Integrator'
Key 'Duration of Working': Value: '10 years'
Key 'Reasons of Quitting': Value: 'New challenge'
Key 'Other Information:': Value:
----------------------------------------
