# Azure Document Intelligence Custom Template User Feedback Loop Experiment

This experiment demonstrates how to replicate the functionality of the [Azure AI Document Intelligence](https://learn.microsoft.com/en-GB/azure/ai-services/document-intelligence/overview) Studio custom model training process to showcase how to create a user feedback loop for improving the quality of document processing results.

This notebook showcases a more interactive user feedback experience, enabling a user to draw over an uploaded, analyzed document to provide feedback on the quality of results by highlighting incorrect or missing information with corrections. This implementation could be replicated in any client application using your chosen framework capabilities.

The goal is to showcase how a feedback mechanism can be implemented to allow the developers of custom models in Azure AI Document Intelligence to collect feedback from users to improve the model with the ability to retrain.

> **Note**: This notebook provides _one_ potential approach to user interaction, and can be interpreted in many ways based on your use case.

## Pre-requisites

> **Note**: Before continuing, please ensure that the [`Deploy-Infrastructure.ps1`](./Deploy-Infrastructure.ps1) script has been run to deploy the required infrastructure to Azure. This includes the Azure AI Document Intelligence resource and the Azure Storage account for creating a custom model.

This notebook uses [Dev Containers](https://code.visualstudio.com/docs/remote/containers) to ensure that all the required dependencies are available in a consistent local development environment.

The following are required to run this notebook:

- [Visual Studio Code](https://code.visualstudio.com/)
- [Docker Desktop](https://www.docker.com/products/docker-desktop)
- [Remote - Containers extension for Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)

> **Note**: The Dev Container is pre-configured with the required dependencies and extensions. You can run this notebook outside of a Dev Container, but you will need to manually install the required dependencies including Poppler, Tesseract, and OpenCV.

The Dev Container will include the following dependencies by default:

- Debian 11 (Bullseye) base image
- Python 3.12
  - azure-ai-formrecognizer - for interacting with the Azure AI Document Intelligence service
  - azure-core - for interacting with the Azure AI Document Intelligence service
  - ipycanvas - for rendering the document and allowing the user to draw over it
  - ipykernel - for running the notebook
  - notebook - for running the notebook
  - opencv-python-headless - for image processing
  - pdf2image - for converting PDFs to images
  - pytesseract - for performing OCR on the document
- Poppler - used by pdf2image to convert PDFs to images
- Tesseract OCR - used by pytesseract to perform OCR on the document
- Python3 OpenCV - used for image processing

## Create a Document Intelligence Custom Model

In order to improve a model, you need to have one. This experiment comes prepared with the data required to train a custom model. The data is located in the [`model_training`](./model_training/) directory and contains a set of invoices that will be used by the following steps.

The steps below perform the following:

- Initialize local environment variables based on the output of the `Deploy-Infrastructure.ps1` script run. The environment variables will be available in the [`config.env`](./config.env) file.
- Create a model training client (using the provided class) and run it to upload the files to Azure Blob Storage, and training the model using Azure AI Document Intelligence.

> **Note**: The `Deploy-Infrastructure.ps1` script is not run as part of this notebook. It must be run separately, prior to running this notebook, to deploy the required infrastructure to Azure.

In [None]:
import os
import datetime
from azure.ai.formrecognizer import (DocumentModelAdministrationClient, ModelBuildMode, DocumentAnalysisClient, AnalyzeResult)
from azure.core.credentials import AzureKeyCredential
from azure.storage.blob import BlobServiceClient, ContainerSasPermissions, generate_container_sas
from dotenv import dotenv_values

working_dir = os.path.abspath('')
config = dotenv_values(f"{working_dir}/config.env")

model_name = 'invoices'
initial_model_version = '1.0.0'

initial_model_id = f"{model_name}-{initial_model_version}"

In [None]:
class ModelTrainingClient:
    def __init__(self, config):
        document_intelligence_endpoint = config['AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT']
        document_intelligence_key = config['AZURE_DOCUMENT_INTELLIGENCE_KEY']
        storage_connection_string = config['AZURE_STORAGE_ACCOUNT_CONNECTION_STRING']
        training_data_container_name = config['AZURE_DOCUMENT_INTELLIGENCE_TRAINING_DATA_CONTAINER_NAME']

        blob_service_client = BlobServiceClient.from_connection_string(storage_connection_string)

        self.storage_account_key = config['AZURE_STORAGE_ACCOUNT_KEY']
        self.training_data_container_client = blob_service_client.get_container_client(training_data_container_name)
        self.document_model_admin_client = DocumentModelAdministrationClient(endpoint=document_intelligence_endpoint, credential=AzureKeyCredential(document_intelligence_key))
        self.document_analysis_client = DocumentAnalysisClient(endpoint=document_intelligence_endpoint, credential=AzureKeyCredential(document_intelligence_key))

    def upload_training_data(self, training_data_folder_path):
        for root, _, files in os.walk(training_data_folder_path):
            for file in files:
                blob_client = self.training_data_container_client.get_blob_client(file)
                with open(f"{root}/{file}", "rb") as data:
                    blob_client.upload_blob(data, overwrite=True)

        start_time = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(minutes=5)
        expiry_time = start_time + datetime.timedelta(days=1)

        sas_token = generate_container_sas(
            account_name=blob_client.account_name,
            container_name=blob_client.container_name,
            account_key=self.storage_account_key,
            permission=ContainerSasPermissions(read=True, list=True),
            expiry=expiry_time,
            start=start_time
        )

        self.training_data_container_client_sas_url = f"{self.training_data_container_client.url}?{sas_token}"        
    
    def create_model(self, model_name):
        try:
            self.document_model_admin_client.delete_document_model(model_name)
        except:
            pass

        poller = self.document_model_admin_client.begin_build_document_model(
            build_mode=ModelBuildMode.TEMPLATE,
            blob_container_url=self.training_data_container_client_sas_url,
            model_id=model_name
        )
        self.model = poller.result()
        return self.model
    
    def run_layout_analysis(self, file_path):
        with open(file_path, "rb") as f:
            poller = self.document_analysis_client.begin_analyze_document(model_id='prebuilt-layout', document=f)
            self.analysis_result = poller.result()
        return self.analysis_to_json(self.analysis_result)

    def analysis_to_json(self, analysis_result: AnalyzeResult):
        date = datetime.datetime.now(datetime.UTC).__format__('%Y-%m-%dT%H:%M:%SZ')

        return {
            "status": "succeeded",
            "createdDateTime": date,
            "lastUpdatedDateTime": date,
            "analyzeResult": analysis_result.to_dict()
        }

In [None]:
model_training_client = ModelTrainingClient(config)

### Upload initial training data and create the model

In [None]:
model_training_client.upload_training_data(f"{working_dir}/model_training")
invoice_model = model_training_client.create_model(model_name=initial_model_id)

## User Feedback Setup

The following will setup the required imports and constants for the notebook, including the path to the sample Invoice PDF that will be used, as well as creating the necessary paths to store the page image outputs.

In [None]:
from pdf2image import convert_from_path
from ipycanvas import Canvas
from ipywidgets import Image, Dropdown, Text, VBox, Label
import pytesseract
import cv2
import json

pdf_file_name = 'Invoice_6.pdf'
pdf_dir = os.path.join(working_dir, 'pdfs')
pdf_path = os.path.join(pdf_dir, pdf_file_name)

images_dir = os.path.join(working_dir, 'images')
if not os.path.exists(images_dir):
    os.makedirs(images_dir)

## Define object for tracking feedback regions

The following object is used to define the square border in which the user can draw over the document to provide feedback with. This object tracks the start and end coordinates of the border, as well as functions for performing the drawing of the border, normalizing the coordinates for the labels JSON output, and extracting the text within the border using OCR.

In [None]:
class SquareBorder:
    def __init__(self, image_path_ref: str, page_ref: int, border_width=2, border_color='black'):
        self.image_path_ref = image_path_ref
        self.page_ref = page_ref
        self.border_width = border_width
        self.border_color = border_color
        self.start_x = None
        self.start_y = None
        self.end_x = None
        self.end_y = None

    def start(self, x, y):
        self.start_x = x
        self.start_y = y

    def end(self, x, y):
        self.end_x = x
        self.end_y = y

    def draw(self, canvas: Canvas):
        canvas.stroke_style = self.border_color
        canvas.line_width = self.border_width
        canvas.stroke_rect(self.start_x, self.start_y, self.end_x - self.start_x, self.end_y - self.start_y)
        self.normalize(canvas)

    def normalize(self, canvas: Canvas):
        # normalize the square_border pixels 0..1
        self.start_x_normalized = self.start_x / canvas.width
        self.start_y_normalized = self.start_y / canvas.height
        self.end_x_normalized = self.end_x / canvas.width
        self.end_y_normalized = self.end_y / canvas.height

    def extract_text(self):
        try:
            start_x_int = int(self.start_x)
            start_y_int = int(self.start_y)
            end_x_int = int(self.end_x)
            end_y_int = int(self.end_y)

            img = cv2.imread(self.image_path_ref)
            crop_img = img[start_y_int:end_y_int, start_x_int:end_x_int]
            return pytesseract.image_to_string(crop_img)
        except:
            return ''

    def get_bounding_box(self):
        return [(self.start_x, self.start_y), (self.end_x, self.end_y)]

    def get_normalized_bounding_box(self):
        return [self.start_x_normalized, self.start_y_normalized, self.end_x_normalized, self.start_y_normalized, self.end_x_normalized, self.end_y_normalized, self.start_x_normalized, self.end_y_normalized]

## Load the PDF document into view for user feedback

### Canvas handling class

The code below defines a class that will handle the creation of the Canvas and handlers for mouse interaction events. This class will be used to create the canvas and handle the drawing of the feedback borders.

In [None]:
square_borders = []

def handle_mouse_down_start_draw(canvas: Canvas, x, y):
    square_border = SquareBorder(canvas.image_path_ref, canvas.page_ref)
    square_border.start(x, y)
    square_borders.append(square_border)

def handle_mouse_down_end_draw(canvas: Canvas, x, y):
    square_border = square_borders[-1]
    square_border.end(x, y)
    square_border.draw(canvas)

def analyze_pdf(file_path: str):
    layout_analysis = model_training_client.run_layout_analysis(file_path)
    layout_analysis_path_ref = os.path.join(pdf_dir, f'{pdf_file_name}.ocr.json')
    with open(layout_analysis_path_ref, 'w') as f:
        json.dump(layout_analysis, f)

def load_pdf(file_path: str):
    pages = convert_from_path(file_path, fmt='jpeg')

    print(f'Loaded {len(pages)} pages')

    canvases = [Canvas(width=page.width, height=page.height) for page in pages]

    for i, page in enumerate(pages):
        page_ref = i + 1
        image_path_ref = os.path.join(images_dir, f'{pdf_file_name}.page_{page_ref}.jpg')
        page.save(image_path_ref, 'JPEG')
    
        canvases[i].image_path_ref = image_path_ref
        canvases[i].page_ref = page_ref

        canvases[i].draw_image(Image.from_file(image_path_ref), 0, 0, pages[i].width, pages[i].height)
        canvases[i].on_mouse_down(lambda x, y: handle_mouse_down_start_draw(canvases[i], x, y))
        canvases[i].on_mouse_up(lambda x, y: handle_mouse_down_end_draw(canvases[i], x, y))

    return canvases

### Run layout analysis on the PDF document using Azure AI Document Intelligence

This step will use the Azure AI Document Intelligence service to perform layout analysis on the PDF document. When complete, the analysis will be captured as a JSON object and stored in the `pdfs` folder using the name of the PDF with the additional suffix `.ocr.json`.

> **Note**: This specific step does not need to be run every time. The layout analysis is only required to be run once to capture the initial state of the document.

In [None]:
analyze_pdf(pdf_path)

### Display the PDF in the notebook for user feedback

The following code will perform the following:

1. Load the PDF document and store each page as an image using pdf2image.
1. Display the rendered image using Canvas below as an interactive element in an output cell. **Note**: The image is rendered at the original size of the PDF page.
1. Allow you to draw borders over the rendered image by clicking/holding, dragging, and releasing the mouse.

Below is an example of this interaction in action.

![Demonstration of canvas selection](./media/canvas-selection.gif)

> **Note**: This simple demonstration does not allow drawn borders to be removed or edited once drawn. To start again, you will need to re-run the cell.

As well as displaying the PDF, the required fields for the model are also displayed.

In [None]:
fields_file_path = os.path.join(working_dir, 'model_training', 'fields.json')
with open(fields_file_path, 'r') as f:
    fields = json.load(f)

display_fields = [Label(value=f'{field['fieldKey']}') for field in fields['fields']]
display_container = VBox(display_fields)

print('Fields to capture')
display(display_container)

canvases = load_pdf(pdf_path)
canvases[0]

## Process the user feedback into Document Intelligence labels format

Once the user has drawn borders over the document to provide feedback, the following code will process the drawn borders into the labels JSON format used by the Azure AI Document Intelligence service. The files will be saved to the `./pdfs` directory with the name format `<pdf_file_name>.labels.json`.

In a real-world scenario, the labels JSON files could be loaded into a UI to allow the user to update the label names associated with the custom model, and then retrain the model using the updated labels and PDF documents. For the purposes of this experiment, these are rendered as UI inputs in the notebook.

### Document Intelligence Label Class

The following class is used to define the labels for the Azure AI Document Intelligence service and render the labels as UI inputs in the notebook.

In [None]:
class DocumentIntelligenceLabel:
    def __init__(self, border: SquareBorder, fields):
        self.label = ''
        self.field = ''
        self.item_row_number = ''
        self.item_row_field = ''
        self.label_type = None
        self.text = border.extract_text()
        self.border = border
        self.fields = fields

        self.ui_field = None
        self.ui_text = None
        self.ui_bounding_box = None
        self.ui_row_number = None
        self.ui_row_field = None
        self.ui_row_container = None
        self.ui_container = None

    def render(self):
        field_options = ['']
        for field in self.fields['fields']:
            field_options.append(field['fieldKey'])

        self.ui_field = Dropdown(
            options=field_options,
            description='Field:',
            continuous_update=True
        )
        self.ui_field.observe(self.handle_field_change, names='value')

        self.ui_text = Text(
            value=self.text,
            description='Text:',
            continuous_update=True
        )
        self.ui_text.observe(self.handle_text_change, names='value')

        self.ui_bounding_box = Label(value=f'Bounding Box: {self.border.get_bounding_box()}')

        self.ui_container = VBox([self.ui_field, self.ui_text, self.ui_bounding_box])

        return self.ui_container

    def handle_field_change(self, change):
        self.field = change.new

        field_option = next((x for x in self.fields['fields'] if x['fieldKey'] == change.new), None)
        if field_option:
            if field_option['fieldType'] == "array":
                itemType = field_option['itemType']
                definition = self.fields['definitions'][itemType]

                # Add a text box to the existing vbox for the row number
                self.ui_row_number = Text(
                    value='',
                    description='Row Number:',
                    continuous_update=True
                )
                self.ui_row_number.observe(self.handle_row_number_change, names='value')

                row_field_options = ['']
                for row_field in definition['fields']:
                    row_field_options.append(row_field['fieldKey'])

                self.ui_row_field = Dropdown(
                    options=row_field_options,
                    description='Row Field:',
                    continuous_update=True
                )
                self.ui_row_field.observe(self.handle_row_field_change, names='value')

                self.ui_row_container = VBox([self.ui_row_number, self.ui_row_field])
                self.ui_container.children = self.ui_container.children + (self.ui_row_container,)
            else:
                self.label = self.field

                if field_option['fieldType'] == "signature":
                    self.label_type = "region"
                else:
                    self.label_type = None
                
                if self.ui_row_container is not None:
                    self.ui_container.children = self.ui_container.children[:-1]
                    self.ui_row_container = None

    def handle_text_change(self, change):
        self.text = change.new

    def handle_row_number_change(self, change):
        self.item_row_number = change.new
        self.set_row_label()

    def handle_row_field_change(self, change):
        self.item_row_field = change.new
        self.set_row_label()

    def set_row_label(self):
        self.label = f"{self.field}/{self.item_row_number}/{self.item_row_field}"

    def as_label(self):
        label_json = {
            "label": self.label, 
            "value": [
                {
                    "page": self.border.page_ref,
                    "text": self.text,
                    "boundingBoxes": [self.border.get_normalized_bounding_box()]
                }
            ],
        }

        if self.label_type is not None:
            label_json['labelType'] = self.label_type

        return label_json

### Render the user feedback as UI inputs

The following code will render the user feedback as UI inputs in the notebook as output. You can update the label names associated with the custom model.

Each square border previously drawn over the document will be rendered as a UI input in the notebook. The fields will be pre-populated, and you will be able to select the label from the available options for the model, and update the text associated with the label.

> **Note**: If the label option is an array, you will be provided with additional options to provide the row number and the column label.

In [None]:
labels = [DocumentIntelligenceLabel(border, fields) for border in square_borders]
    
for label in labels:
    display(label.render())

### Create the labels JSON file

Once the user has updated the label names and text associated with the drawn borders, the following code will create the labels JSON file in the format required by the Azure AI Document Intelligence service. The file will be saved to the `./pdfs` directory with the name format `<pdf_file_name>.labels.json`.

In [None]:
labels_json = {
    "$schema": "https://schema.cognitiveservices.azure.com/formrecognizer/2021-03-01/labels.json",
    "document": pdf_file_name,
    "labels": [label.as_label() for label in labels]
}

labels_file_path = os.path.join(pdf_dir, f'{pdf_file_name}.labels.json')
with open(labels_file_path, 'w') as f:
    json.dump(labels_json, f, indent=4)

## Retrain the model using the updated labels and PDF documents

Once complete, the generated user feedback can be uploaded to the Azure Storage account and used to retrain the model using the Azure AI Document Intelligence service.

In [None]:
updated_model_version = "1.1.0"
updated_model_id = f"{model_name}-{updated_model_version}"

model_training_client.upload_training_data(pdf_dir)
updated_model = model_training_client.create_model(model_name=updated_model_id)