***

**<center><font size = "6">Automate Data Capture at Scale with Document AI<center>**
***
<center><font size = "2">Prepared by: Sitsawek Sukorn<center>

## Create and Test a Document AI Processor

### Part 1. Create and test a general form processor

### Task 1. Enable the Cloud Document AI API


Before you can begin using Document AI, you must enable the API.

- In Google Cloud Console, on the Navigation menu (Navigation menu icon), click APIs & services > Library.

- Search for Cloud Document AI API, then click the Enable button to use the API in your Google Cloud project.

### Task 2. Create a general form processor

Next you will create a Document AI processor using the Document AI Form Parser.

- In the console, on the Navigation menu (Navigation menu icon), click Document AI > Overview.

- Click Explore processors and select Form Parser, which is a type of general processor.

- Specify the processor name as form-parser and select the region US (United States) from the list.

- Click Create to create the general form-parser processor.

This will create the processor and return to the processor details page that will display the processor ID, status, and the prediction endpoint.

- Make a note of the Processor ID as you will use it with curl to make a POST call to the API in a later task.

### Task 3. Download the sample form

Download the form.pdf file to your local machine.

### Task 4. Upload a form for Document AI processing

- In Cloud Console, on your form-parser page, click the Upload Test Document button. A dialog will pop up - select the file you downloaded in the previous task for uploading.

### Task 5. Check output quality

The key/value pairs parsed from the source document will be presented in the Cloud Console. The left hand pane lists the data, and the right hand pane highlights with blue rectangles the source locations in the parsed document. Examine the output and compare the results with the source data.

### Part 2. Test a Document AI form processor using the API

### Task 6. Connect to the lab VM instance using SSH

You will perform the remainder of the lab tasks in the lab VM called document-ai-dev.

- In the Cloud Console, on the Navigation menu (Navigation menu icon), click Compute Engine > VM Instances.

- Click the SSH link for the VM Instance called document-ai-dev.

You will need the Document AI processor ID of the processor you created in Task 1 for this step. If you did not save it, then in the Cloud Console tab open the Navigation menu (Navigation menu icon), click Document AI > Processors, then click the name of your processor to open the details page. From here you can copy the processor ID.

- In the SSH session create an environment variable to contain the Document AI processor ID. You must replace the placeholder for [your processor id]:

In [None]:
export PROCESSOR_ID=[your processor id]

- In the SSH session confirm that the environment variable contains the Document AI processor ID:

In [None]:
echo Your processor ID is:$PROCESSOR_ID

This should print out the Processor ID similar to the following:
Your processor ID is:4897d834d2f4415d

### Task 7. Authenticate API requests

In order to make requests to the Document AI API, you need to provide a valid credential. In this task you will create a service account, limit the permissions granted to that service account to those required for the lab, and then generate a credential for that account that can be used to authenticate Document AI API requests.

- Set an environment variable with your Project ID, which you will use throughout this lab:


In [None]:
export PROJECT_ID=$(gcloud config get-value core/project)

- Create a new service account to access the Document AI API by using:

In [None]:
export SA_NAME="document-ai-service-account"
gcloud iam service-accounts create $SA_NAME --display-name $SA_NAME

- Bind the service account to the Document AI API user role:


In [None]:
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:$SA_NAME@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/documentai.apiUser"

- Create the credentials that will be used to log in as your new service account and save them in a JSON file called key.json in your working directory:

In [None]:
gcloud iam service-accounts keys create key.json \
--iam-account  $SA_NAME@${PROJECT_ID}.iam.gserviceaccount.com

- Set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the library to find your credentials, to point to the credentials file:

In [None]:
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/key.json"

- Check that the GOOGLE_APPLICATION_CREDENTIALS environment variable is set to the full path of the credentials JSON file you created earlier:


In [None]:
echo $GOOGLE_APPLICATION_CREDENTIALS

### Task 8. Download the sample form to the VM instance


Now you can download a sample form and then base64 encode it for submission to the Document AI API.

- Enter the following command in the SSH window to download the sample form to your working directory:

In [None]:
gsutil cp gs://cloud-training/gsp924/health-intake-form.pdf .

- Create a JSON request file for submitting the base64 encoded form for processing:

In [None]:
echo '{"inlineDocument": {"mimeType": "application/pdf","content": "' > temp.json
base64 health-intake-form.pdf >> temp.json
echo '"}}' >> temp.json
cat temp.json | tr -d \\n > request.json

### Task 9. Make a synchronous process document request using curl

In this task you process the sample document by making a call to the synchronous Document AI API endpoint using curl.

- Submit a form for processing via curl. The result will be stored in output.json:

In [None]:
export LOCATION="us"
export PROJECT_ID=$(gcloud config get-value core/project)
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://${LOCATION}-documentai.googleapis.com/v1beta3/projects/${PROJECT_ID}/locations/${LOCATION}/processors/${PROCESSOR_ID}:process > output.json

### Task 10. Extract the form entitied

Next, explore some of the information extracted from the sample form.

- Extract the raw text detected in the document as follows:


In [None]:
cat output.json | jq -r ".document.text"

This lists all of the text detected in the uploaded document.

- Extract the list of form fields detected by the form processor:

In [None]:
cat output.json | jq -r ".document.pages[].formFields"

This lists the object data for all of the form fields detected in the document. The textAnchor.startIndex and textAnchor.endIndex values for each form can be used to locate the names of the detected forms in the document.text field. The Python script that you will use in the next task will do this mapping for you.

The JSON file is quite large as it includes the base64 encoded source document as well as all of the detected text and document properties. You can explore the JSON file by opening the file in a text editor or by using a JSON query tool like jq.

### Part 3. Test a Document AI form processor using the Python client libraries

### Task 11. Configure your VM Instance to use the Document AI Python client


Now install the Python Google Cloud client libraries into the VM Instance.

- Enter the following command in the SSH terminal shell to import the lab files into your VM Instance:

In [None]:
gsutil cp gs://cloud-training/gsp924/synchronous_doc_ai.py .

- Enter the following command to install the Python client libraries required for Document AI and the other libraries required for this lab:

In [None]:
python3 -m pip install --upgrade google-cloud-documentai google-cloud-storage prettytable 

### Task 12. Review the Document AI API Python code

- The first two code blocks import the required libraries and parses parameters to initialize variables that identify the Document AI processor and input data.


In [None]:
import argparse
from google.cloud import documentai_v1beta3 as documentai
from google.cloud import storage
from prettytable import PrettyTable
parser = argparse.ArgumentParser()
parser.add_argument("-P", "--project_id", help="Google Cloud Project ID")
parser.add_argument("-D", "--processor_id", help="Document AI Processor ID")
parser.add_argument("-F", "--file_name", help="Input file name", default="form.pdf")
parser.add_argument("-L", "--location", help="Processor Location", default="us")
args = parser.parse_args()

- The process_document function is used to make a synchronous call to a Document AI processor. The function creates a Document AI API client object.


The processor name required by the API call is created using the project_id,location, and processor_id parameters and the document to be processed is read in and stored in a mime_type structure.

The processor name and the document are then passed to the Document API client object and a synchronous call to the API is made. If the request is successful the document object that is returned will include properties that contain the data that has been detected by the Document AI processor.

In [None]:
def process_document(project_id, location, processor_id, file_path ):
    # Instantiates a client
    client = documentai.DocumentProcessorServiceClient()
    # The full resource name of the processor, e.g.:
    # projects/project-id/locations/location/processor/processor-id
    # You must create new processors in the Cloud Console first
    name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()
        
    # Create the document object 
    document = {"content": image_content, "mime_type": "application/pdf"}
    # Configure the process request
    request = {"name": name, "document": document}
    # Use the Document AI client synchronous endpoint to process the request
    result = client.process_document(request=request)
    return result.document

- The script then calls the process_document function with the required parameters and saves the response in the document variable.


In [None]:
document = process_document(args.project_id,args.location,args.processor_id,args.file_name )

- The final block of code prints the .text property that contains all of the text detected in the document then displays the form information using the text anchor data for each of the form fields detected by the form parser.


In [None]:
print("Document processing complete.")
print("Text: {}".format(document.document_text))
# Define a function to retrieve an object dictionary for a named element
def get_text(doc_element: dict, document: dict):
    """
    Document AI identifies form fields by their offsets
    in document text. This function converts offsets
    to text snippets.
    """
    response = ""
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    for segment in doc_element.text_anchor.text_segments:
        start_index = (
            int(segment.start_index)
            if segment in doc_element.text_anchor.text_segments
            else 0
        )
        end_index = int(segment.end_index)
        response += document.text[start_index:end_index]
    return response
# Grab each key/value pair and their corresponding confidence scores.
document_pages = document.pages
print("Form data detected:\n")
# For each page fetch each form field and display fieldname, value and confidence scores
for page in document_pages:
    print("Page Number:{}".format(page.page_number))
    for form_field in page.form_fields:
        fieldName=get_text(form_field.field_name,document)
        nameConfidence = round(form_field.field_name.confidence,4)
        fieldValue = get_text(form_field.field_value,document)
        valueConfidence = round(form_field.field_value.confidence,4)
        print(fieldName+fieldValue +"  (Confidence Scores: (Name) "+str(nameConfidence)+", (Value) "+str(valueConfidence)+")\n")

### Task 13. Run the Document AI Python code


Execute the sample code and process the same file as before.

- Create environment variables for the Project ID and the IAM service account credentials file:

In [None]:
export PROJECT_ID=$(gcloud config get-value core/project)
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/key.json"

- Call the synchronous_doc_ai.py python program with the parameters it requires:

In [None]:
python3 synchronous_doc_ai.py \
--project_id=$PROJECT_ID \
--processor_id=$PROCESSOR_ID \
--location=us \
--file_name=health-intake-form.pdf | tee results.txt

You will see the following block of text output:

FakeDoc M.D.
HEALTH INTAKE FORM
Please fill out the questionnaire carefully. The information you provide will be used to complete
your health profile and will be kept confidential.
Date:
Sally
Walker
Name:
9/14/19
...

The first block of text is a single text element containing all of the text in the document. This block of text does not include any awareness of form based data so some items, such as the Date and Name entries, are mixed together in this raw text value.

The code then outputs a more structured view of the data using the form data that the form-parser has inferred from the document structure. This structured output also includes the confidence score for the form field names and values. The output from this section gives a much more useful mapping between the form field names and the values, as can be seen with the link between the Date and Name form fields and their correct values.



Form data detected:
Page Number:1
Phone #: (906) 917-3486
(Confidence Scores: (Name) 1.0, (Value) 1.0)
...
Date:
9/14/19
(Confidence Scores: (Name) 0.9999, (Value) 0.9999)
...
Name:
Sally
Walker
(Confidence Scores: (Name) 0.9973, (Value) 0.9973)
...


***

**<center><font size = "6">Process Documents with Python Using the Document AI API<center>**
***

### Task 1. Create and test a general form processor

#### Enable the Cloud Document AI API

Before you can begin using Document AI, you must enable the API.

- In Cloud Console, from the Navigation menu (Navigation menu icon), click APIs & services > Library.

- Search for Cloud Document AI API, then click the Enable button to use the API in your Google Cloud project.

If the Cloud Document AI API is already enabled you will see the Manage button and you can continue with the rest of the lab.

#### Create a general form processor

Create a Document AI processor using the Document AI form parser.

- In the console, on the Navigation menu (Navigation menu icon), click Document AI > Overview.

- Click Create processor and select Form Parser, which is a type of general processor.

- Specify the processor name as form-parser and select the region US (United States) from the list.

- Click Create to create the general form-parser processor.

This will create the processor and return to the processor details page that will display the processor ID, status, and the prediction endpoint.

- Make a note of the Processor ID as you will need to update variables in JupyterLab notebooks with the Processor ID in later tasks.

### Task 2. Configure your Vertex AI Notebooks instance to perform Document AI API calls


Next you will connect to JupyterLab running on the Vertex AI Notebooks instance that was created for you when the lab was started and then configure that environment for the remaining lab tasks.

- In the Cloud Console, on the Navigation menu, click Vertex AI > Workbench.

- Click Open Jupyterlab to open the JupyterLab console on your Vertex AI Notebooks instance.

- Click Terminal to open a terminal shell inside the Vertex AI Notebooks instance.

- Enter the following command in the terminal shell to import the lab files into your Vertex AI Notebooks instance:

In [None]:
gsutil cp gs://cloud-training/gsp925/*.ipynb .

- Enter the following command in the terminal shell to install the Python client libraries required for Document AI and other required libraries:


In [None]:
python -m pip install --upgrade google-cloud-core google-cloud-documentai google-cloud-storage prettytable 

You should see output indicating that the libraries have been installed successfully.

- Enter the following command in the terminal shell to import the sample health intake form:

In [None]:
gsutil cp gs://cloud-training/gsp925/health-intake-form.pdf form.pdf

- In the notebook interface open the JupyterLab notebook called documentai-sync.ipynb.

### Task 3. Make a synchronous process document request

#### Review the Python code for synchronous Document AI API calls

Take a minute to review the Python code in the documentai-sync.ipynb notebook.

The first code block imports the required libraries and initializes some variables.

In [None]:
from google.cloud import documentai_v1beta3 as documentai
from google.cloud import storage
from prettytable import PrettyTable
project_id = %system gcloud config get-value core/project
project_id = project_id[0]
location = 'us'           
file_path = 'form.pdf'    

The Set your Processor ID code cell sets the Processor ID that you have to manually set before you can process documents with the notebook.

In [None]:
processor_id = 'PROCESSOR_ID' # TODO: Replace with a valid Processor ID   

The Process Document Function code cell defines the process_document function that is used to make a synchronous call to a Document AI processor. The function creates a Document AI API client object.

The processor name required by the API call is created using the project_id,locations, and processor_id parameters and the sample PDF document is read in and stored in a mime_type structure.

The function creates a request object that contains the full processor name of the document and uses that object as the parameter for a synchronous call to the Document AI API client. If the request is successful the document object that is returned will include properties that contain the entities detected in the form.

In [None]:
def process_document(
            project_id=project_id, location=location,
            processor_id=processor_id,  file_path=file_path 
    ):
    # Instantiates a client
    client = documentai.DocumentProcessorServiceClient()
    # The full resource name of the processor, e.g.:
    # projects/project-id/locations/location/processor/processor-id
    # You must create new processors in the Cloud Console first
    name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
    with open(file_path, "rb") as image:
        image_content = image.read()
    # Read the file into memory
    document = {"content": image_content, "mime_type": "application/pdf"}
    # Configure the process request
    request = {"name": name, "document": document}
    # Use the Document AI client to process the sample form
    result = client.process_document(request=request)
    return result.document

The Process Document code cell calls the process_document function, saves the response in the document variable, and prints the raw text that has been detected. All of the processors will report some data for the document.text property.

In [None]:
document=process_document()
# print all detected text. 
# All document processors will display the text content
print("Document processing complete.")
print("Text: {}".format(document.text))

The Get Text Function code cell defines the get_text() function that retrieves the text for a named element using the text_anchor start_index and end_index properties of the named element's text_segments. This function is used to retrieve the form name and form value for form data if that data is returned by the processor.

In [None]:
def get_text(doc_element: dict, document: dict):
    """
    Document AI identifies form fields by their offsets
    in document text. This function converts offsets
    to text snippets.
    """
    response = ""
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    for segment in doc_element.text_anchor.text_segments:
        start_index = (
            int(segment.start_index)
            if segment in doc_element.text_anchor.text_segments
            else 0
        )
        end_index = int(segment.end_index)
        response += document.text[start_index:end_index]
    return response

The Display Form Data cell iterates over all pages that have been detected and for each form_field detected it uses the get_text() function to retrieve the field name and field value. Those values are then printed out, along with their corresponding confidence scores. Form data will be returned by processors that use the general form parser or the specialized parsers but will not be returned by processors that were created with the Document OCR parser.

In [None]:
document_pages = document.pages
    print("Form data detected:\n")
    # For each page fetch each form field and display fieldname, value and confidence scores
    for page in document_pages:
        print("Page Number:{}".format(page.page_number))
        for form_field in page.form_fields:
            fieldName=get_text(form_field.field_name,document)
            nameConfidence = round(form_field.field_name.confidence,4)
            fieldValue = get_text(form_field.field_value,document)
            valueConfidence = round(form_field.field_value.confidence,4)
            print(fieldName+fieldValue +"  (Confidence Scores: (Name) "+str(nameConfidence)+", (Value) "+str(valueConfidence)+")\n")

The Display Entity Data cell extracts entity data from the document object and displays the entity type, value, and confidence properties for each entity detected. Entity data is only returned by processors that use specialized Document AI parsers such as the Procurement Expense parser. The general form parser and the Document OCR parser will not return entity data.

In [None]:
if 'entities' in dir(document):
        entities = document.entities
        # Grab each key/value pair and their confidence scores.
        table = PrettyTable(['Type', 'Value', 'Confidence'])
        for entity in entities:
        entity_type = entity.type_
        value = entity.mention_text
        confience = round(entity.confidence,4)
        table.add_row([entity_type, value, confience])
        print(table)
    else:
        print("Document does not contain entity data.")

### Task 4. Run the synchronous Document AI Python code


Execute the code to make synchronous calls to the Document AI API in the JupyterLab notebook.

- In the second Set your Processor ID code cell replace the PROCESSOR_ID placeholder text with the Processor ID for the form-parser processor you created in an earlier step.

- Select the first cell, click the Run menu and then click Run Selected Cell and All Below to run all the code in the notebook.

If you have used the sample health intake form, you will data similar to the following for the output cell for the form data:

Form data detected:
Page Number:1
Phone #: (906) 917-3486
  (Confidence Scores: (Name) 1.0, (Value) 1.0)
...
Date:
9/14/19
  (Confidence Scores: (Name) 0.9999, (Value) 0.9999)
...
Name:
Sally
Walker
  (Confidence Scores: (Name) 0.9973, (Value) 0.9973)
  ...

If you are able to create a specialised processor the final cell will display entity data, otherwise it will show an empty table.

- In the JupyterLab menu click File and then click Save Notebook to save your progress.

### Task 5. Create a Document AI Document OCR processor

In this task you will create a Document AI processor using the general Document OCR parser.

- In the Cloud Console, on the Navigation menu, click Document AI > Overview.

- Click Create Processor and then select Document OCR. This is a type of general processor.

- Specify the processor name as ocr-processor and select the region US (United States) from the list.

- Click Create to create your processor.

- Make a note of the processor ID. You will use need to specify this in a later task.

### Task 6. Prepare your environment for asynchronous Document AI API calls


In this task you upload the sample JupyterLab notebook to test asynchronous Document AI API calls and copy some sample forms for the lab to Cloud Storage for asynchronous processing.

- Click the Terminal tab to re-open the terminal shell inside the Vertex AI Notebooks instance.

- Create a Cloud Storage bucket for the input documents and copy the sample W2 forms into the bucket:

In [None]:
    export PROJECT_ID="$(gcloud config get-value core/project)"
    export BUCKET="${PROJECT_ID}"_doc_ai_async
    gsutil mb gs://${BUCKET}
    gsutil -m cp gs://cloud-training/gsp925/async/*.* gs://${BUCKET}/input

- In the notebook interface open the JupyterLab notebook called documentai-async.ipynb.

### Task 7. Make an asynchronous process document request


#### Review the Python code for asynchronous Document AI API calls

Take a minute to review the Python code in the documentai-async.ipynb notebook.

The first code cell imports the required libraries.

In [None]:
from google.cloud import documentai_v1beta3 as documentai
from google.cloud import storage
import re
import os
import pandas as pd
import simplejson as json

The Set your Processor ID code cell sets the Processor ID that you have to manually set before you can process documents with the notebook.

In [None]:
processor_id = "PROCESSOR_ID"  # TODO: Replace with a valid Processor ID

The Set your variables code cell defines the parameters that will be used to make the asynchronous call, including the location of the input and output Cloud Storage buckets that will be used for the source data and output files. You will update the placeholder values in this cell for the PROJECT_ID and the PROCESSOR_ID in the next section of the lab before you run the code. The other variables contain defaults for the processor location, input Cloud Storage Bucket, and output Cloud Storage bucket that you do not need to change.

In [None]:
project_id = %system gcloud config get-value core/project
project_id = project_id[0]
location = 'us'           # Replace with 'eu' if processor does not use 'us' location
gcs_input_bucket  = project_id+"_doc_ai_async"   # Bucket name only, no gs:// prefix
gcs_input_prefix  = "input/"                     # Input bucket folder e.g. input/
gcs_output_bucket = project_id+"_doc_ai_async"   # Bucket name only, no gs:// prefix
gcs_output_prefix = "output/"                    # Input bucket folder e.g. output/
timeout = 300

The Define Google Cloud client objects code cell initializes the Document AI and Cloud Storage clients.

In [None]:
client_options = {"api_endpoint": "{}-documentai.googleapis.com".format(location)}
client = documentai.DocumentProcessorServiceClient(client_options=client_options)
storage_client = storage.Client()

The Create input configuration code cell creates the input configuration array parameter for the source data that will be passed to the asynchronous Document AI request as an input configuration. This array stores the Cloud Storage source location, and the mime type, for each of the files that are found in the input Cloud Storage location.

In [None]:
blobs = storage_client.list_blobs(gcs_input_bucket, prefix=gcs_input_prefix)
input_configs = []
print("Input Files:")
for blob in blobs:
    if ".pdf" in blob.name:
        source = "gs://{bucket}/{name}".format(bucket = gcs_input_bucket, name = blob.name)
        print(source)
        input_config = documentai.types.document_processor_service.BatchProcessRequest.BatchInputConfig(
            gcs_source=source, mime_type="application/pdf"
        )
        input_configs.append(input_config)

The Create output configuration code cell creates the output parameter for the asynchronous request containing the output Cloud Storage bucket location and stores that as a Document AI batch output configuration.

In [None]:
destination_uri = f"gs://{gcs_output_bucket}/{gcs_output_prefix}"
output_config = documentai.types.document_processor_service.BatchProcessRequest.BatchOutputConfig(
    gcs_destination=destination_uri
)

The Create the Document AI API request code cell builds the asynchronous Document AI batch process request object using the input and output configuration objects.

In [None]:
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
request = documentai.types.document_processor_service.BatchProcessRequest(
    name=name,
    input_configs=input_configs,
    output_config=output_config,
)

The Start the batch (asynchronous) API operation code cell makes an asynchronous document process request by passing the request object to the batch_process_documents() method. This is an asynchronous call so you use the result() method to force the notebook to wait until the background asynchronous job has completed.

In [None]:
operation = client.batch_process_documents(request)
# Wait for the operation to finish
operation.result(timeout=timeout)
print ("Batch process  completed.")

The Fetch list of output files cell enumerates the objects in the output bucket location as defined in the destination_uri variable.

The Display detected text from asynchronous output JSON files cell loads each output JSON file that is found as a Document AI document object and the text data detected by the Document OCR processor is printed out.

The Display entity data cell will display any entity data that is found, however, entity data is only available for processors that were created using a specialized parser. Entity data will not be displayed with the general Document AI OCR parser used in this task.



#### Run the asynchronous Document AI Python code


Use the sample code provided for you in the Jupyterlab notebook to process documents asynchronously using a Document AI batch processing request.

- In the second code cell replace the PROCESSOR_ID placeholder text with the Processor ID for the form-parser processor you created in an earlier step.

- Select the first cell, click the Run menu and then click Run Selected Cell and All Below to run all the code in the notebook.

- As the code cells execute, you can step through the notebook reviewing the code and the comments that explain how the asynchronous request object is created and used.

The notebook will take a minute or two to wait for the asynchronous batch process operation to complete at the Start the batch (asynchronous) API operation code cell. While the batch process API call itself is asynchronous the notebook uses the result method to force the notebook to wait until the asynchronous call has completed before enumerating and displaying the output data.

If the asynchronous job takes longer than expected and times out you may have to run the remaining cells again to display the output. These are the cells after the Start the batch (asynchronous) API operation cell.

Your output will contain text listing the Document AI data detected in each file. The Document OCR parser does not detect form or entity data so there will be no form or entity data produced. If you can create a specialised processor then you will also see entity data printed out by the final cell.

- In the JupyterLab menu click File and then click Save Notebook to save your progress.

Document processing complete.
Text: FakeDoc M.D.
HEALTH INTAKE FORM
Please fill out the questionnaire carefully. The information you provide will be used to complete
your health profile and will be kept confidential.
Date:
Sally
Walker
Name:
9/14/19
...

***

**<center><font size = "6">Process Documents with Python Using the Document AI API<center>**
***