# Challenge 03: Data Modelling: From Retrieval to Upload (1/2)

## Introduction

In this step, we will structure the data retrieved from Azure Document Intelligence (ADI) into the right format to be read by our systems in subsequent steps. 

The data will be outputted from the ADI as a JSON file, and it is our role to process and organize it. Some of the data will be structured into tables, while other data will be formatted as text. This step ensures that the extracted information is organized in a meaningful way for further analysis and usage.

As stated before, we need to make sure that our Function will know how to process:
- **Loan Forms:** Extract relevant details such as borrower information, loan amounts, and terms.
- **Loan Contract:** Identify and parse key contract elements like clauses, signatures, and dates.
- **Pay Stubs:** Retrieve data such as employee details, earnings, deductions, and net pay.

Not all customers will have provided all types of content, and during this Challenge we will be only be processing one file. We will combine in the next challenge the capabilities of a trigger, which will, at a time, also process one single document.

Due to the nature of this challenge, we will separate this challenge in the 3 different types of documents.

## Loan Forms 

The first step to get a Loan, is to fill out a form with some basic details, such as customer ID, Full Name, Date Of Birth, etc, therefore, that's where we will start. 

This particular document combines text and tables, that the ADI capabilities allow you to extract as also separate capabilities.

To first start our analysis, let's create a function that will load the documents inside a folder inside a container that is, on its turn, inside our designated Storage Account. In our particular step, inside the folder of the Loan Forms, we will retrieve one Loan Form for us to analyse. 

We will consequently use this same function to access other folders that will contain other type of documents.


**Question: why are we not batch-analysing documents?**


In [154]:
import os
import json
from azure.storage.blob import BlobServiceClient
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

def read_json_files_from_blob(folder_path):
    # Retrieve the connection string from the environment variables
    connection_string = os.getenv('connection_string')

    # Ensure the connection string is not None
    if connection_string is None:
        raise ValueError("The connection string environment variable is not set.")

    # Create a BlobServiceClient
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)

    # Get the container client
    container_client = blob_service_client.get_container_client("bankdetail")

    # List all blobs in the specified folder
    blob_list = container_client.list_blobs(name_starts_with=folder_path)

    # Filter out JSON files and read their contents
    for blob in blob_list:
        if blob.name.endswith('.json'):
            blob_client = container_client.get_blob_client(blob.name)
            blob_data = blob_client.download_blob().readall()
            data = json.loads(blob_data)
            # print(f"Contents of {blob.name}:")
            # print(json.dumps(data, indent=2))
            # print("\n")
            return data 

Now all we have to do is to call our function and pass the name of our folder as an argument

In [155]:
loanform = read_json_files_from_blob("loanform") ## RETIRAR PARA ELES PERCEBEREM OQ TAO A FAZER

The next step is to create a function that will process the loan application form data. This function will take the loan application form data as input and return the result of the loan application processing. Our input data is a JSON file that is composed of both text and tables, and we will need to treat both of them seperatly.  

The function will perform the following steps:

The create_structured_tables function processes a list of tables by initializing and populating them with cell content, combining specific rows for tables with 3 rows and 5 columns, and returning the structured tables along with any combined rows.

In [156]:
def create_structured_tables(tables):
    structured_tables = []
    combined_rows = []
    
    for table in tables:
        row_count = table.get("row_count", 0)
        column_count = table.get("column_count", 0)
        cells = table.get("cells", [])
        
        # Initialize an empty table
        structured_table = [["" for _ in range(column_count)] for _ in range(row_count)]
        
        # Populate the table with cell content
        for cell in cells:
            row_index = cell.get("row_index", 0)
            column_index = cell.get("column_index", 0)
            content = cell.get("content", "")
            structured_table[row_index][column_index] = content
        
        # Combine the last row with the previous one if the table has 5 columns and 3 rows
        if row_count == 3 and column_count == 5:
            combined_row = [structured_table[1][i] + " " + structured_table[2][i] for i in range(column_count)]
            structured_table[1] = combined_row
            structured_table = structured_table[:2]
            combined_rows.append(combined_row)
        
        # Append the structured table to the list
        structured_tables.append(structured_table)
    
    return structured_tables, combined_rows

The clean_form_recognizer_result function processes form recognizer output by extracting text data while ignoring lines containing the word "table", retaining only the "text" key in each line, and creating structured tables from the table data.

In [157]:
def clean_form_recognizer_result(data):
    text_data = []
    table_encountered = False
    
    for page in data.get("pages", []):
        for line in page.get("lines", []):
            # Check if the line contains the word "table"
            if "table" in line.get("text", "").lower():
                table_encountered = True
                continue  # Skip the line if "table" is in the text
            
            if not table_encountered:
                # Collect the "text" information
                text_data.append(line.get("text", ""))
            
            # Keep only the "text" key
            line_keys = list(line.keys())
            for key in line_keys:
                if key != "text":
                    del line[key]
    
    # Create structured tables
    structured_tables, combined_rows = create_structured_tables(data.get("tables", []))
    data["structured_tables"] = structured_tables
    data["combined_rows"] = combined_rows
    data["text_data"] = text_data
    
    return data

The tables_to_dataframes function converts a list of structured tables into a list of pandas DataFrames.

In [158]:
def tables_to_dataframes(structured_tables):
    dataframes = []
    for table in structured_tables:
        df = pd.DataFrame(table)
        dataframes.append(df)
    return dataframes

We have now retrieved both our table with the structured desired and the text that comes out of our files. However, this function doesn't have the data as structured as we need it to be. 

As an example, we have by now extracted a key-value pair which keys is "text" with the value "Contact Number: (555) 234-5678". What we will need to define now is to remove the name of the field, and start composing the key-value pair that would be key "Contact Number:" and value "(555) 234-5678"

In [159]:
import pandas as pd
import re 

def clean_loan_application_file(text):
    cleaned_data = {}

    # Extract the category from the first three words
    category_match = re.search(r'(\w+\s+\w+\s+\w+)', text)
    if category_match:
        cleaned_data['Category'] = category_match.group(1)
    
    # Extract Applicant Information
    applicant_info = re.search(r'Applicant Information(.*?)Employment and Income Details', text, re.DOTALL)
    if applicant_info:
        applicant_info_text = applicant_info.group(1)
        cleaned_data['Applicant Information'] = {
            'Customer ID': re.search(r'Customer ID:\s*(.*?)Full Name:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Full Name': re.search(r'Full Name:\s*(.*?)Date of Birth:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Date of Birth': re.search(r'Date of Birth:\s*(.*?)Social Security Number:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Social Security Number': re.search(r'Social Security Number:\s*(.*?)Contact Number:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Contact Number': re.search(r'Contact Number:\s*(.*?)Email Address:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Email Address': re.search(r'Email Address:\s*(.*?)Physical Address:', applicant_info_text, re.DOTALL).group(1).strip(),
            'Physical Address': re.search(r'Physical Address:\s*(.*)', applicant_info_text, re.DOTALL).group(1).strip(),
        }

    # Extract Loan Information
    loan_info = re.search(r'Loan Information(.*)', text, re.DOTALL)
    if loan_info:
        loan_info_text = loan_info.group(1)
        cleaned_data['Loan Information'] = {
            'Loan Amount Requested': re.search(r'Loan Amount Requested:\s*\$?(.*?)Purpose of Loan:', loan_info_text, re.DOTALL).group(1).strip(),
            'Purpose of Loan': re.search(r'Purpose of Loan:\s*(.*?)Loan Term Desired:', loan_info_text, re.DOTALL).group(1).strip(),
            'Loan Term Desired': re.search(r'Loan Term Desired:\s*(.*)', loan_info_text, re.DOTALL).group(1).strip(),
        }

    return cleaned_data

# Function to combine extracted tables and text
def process_loan_application(data):
    # Clean form recognizer result to extract structured tables and text
    cleaned_data = clean_form_recognizer_result(data)
    
    # Convert extracted tables to dataframes
    dataframes = tables_to_dataframes(cleaned_data["structured_tables"]) 
    # Combine all table dataframes into one
    combined_df = pd.concat(dataframes, ignore_index=True) 
    combined_df.columns = combined_df.iloc[0]
    combined_df = combined_df[1:]
    combined_df.reset_index(drop=True, inplace=True)
    combined_df.rename(columns={"Contact Number": "Employer Contact Number"}, inplace=True)
    combined_df = combined_df.dropna(how='all')

    # Clean the extracted text using regex
    combined_text = ' '.join(cleaned_data['text_data'])
    text_data = clean_loan_application_file(combined_text)

    def clean_loan_application(data):
    # Extract applicant and loan info
        applicant_info = data['Applicant Information']
        loan_info = data['Loan Information']
        
        # Combine keys and values for the two categories
        fields = list(applicant_info.keys()) + list(loan_info.keys())
        values = list(applicant_info.values()) + list(loan_info.values())
        
        # Create the 2x10 DataFrame without 'Category'
        df = pd.DataFrame({
            'Field': fields,
            'Value': values
        })
        
        return df.set_index('Field').T

    df_cleaned = clean_loan_application(text_data)

    # Convert the text data to a DataFrame
    text_df = pd.DataFrame(df_cleaned)

    # Concatenate the text dataframe with the tables dataframe
    final_df = pd.concat([text_df, combined_df], axis=1)

    def remove_empty_cells_and_push_up(df):
        for column in df.columns:
            non_empty_values = df[column].replace('', pd.NA).dropna().values
            df[column] = pd.Series(non_empty_values).reindex(df.index, fill_value='')
        return df

    return remove_empty_cells_and_push_up(final_df)

# Process the loan application
loanform_structured = process_loan_application(loanform).iloc[1:].reset_index(drop=True)
loanform_structured.replace("Applicant's Signature:,", '', regex=True)

loanform_structured.to_csv('loanform.csv', index=False)


## Pay Stub 

As part of some loan applications, the pay stub is a required document. The pay stub is a document that outlines the details of an employee’s income. It contains the employee’s wages earned, applicable deductions and total gross pay, and net pay for the pay period. A pay stub will provide Contoso bank with crucial information about not only a person’s income and employment stability, which helps assess their ability to repay the loan. It also verifies the applicant’s financial credibility and ensures that their reported income matches their actual earnings.

When processing a Pay Stub, we will have similar challenges as we previously did on the Loan Forms. These particular documents combine text and contrary to the previous use case, more than 1 table, Once again, the ADI capabilities allows you to extract these 2 types of entities as also separate capabilities.

As we've previously create a the function that will load the documents inside a designated folder, all we have to do now is to retrieve all the information inside the paystub folder, we will retrieve one single Loan Form for us to analyse.


In [160]:
paystub = read_json_files_from_blob("paystubs") ## RETIRAR PARA ELES PERCEBEREM OQ TAO A FAZER

In [166]:
def clean_form_recognizer_result(data):
    text_content = []
    
    for page in data.get("pages", []):
        for line in page.get("lines", []):
            # Check if the line contains the word "table"
            if "table" in line.get("text", "").lower():
                continue  # Keep everything if "table" is in the text
            # Keep only the "text" key
            line_keys = list(line.keys())
            for key in line_keys:
                if key != "text":
                    del line[key]
            # Collect the text content
            text_content.append(line.get("text", ""))
    
    # Create structured tables
    structured_tables = create_structured_tables(data.get("tables", []))
    
    # Concatenate all text content into a single string
    plain_text_content = " ".join(text_content)
    
    data["structured_tables"] = structured_tables
    data["plain_text_content"] = plain_text_content

    return data

def parse_pay_stub(pay_stub_text):
        # Dictionary to store parsed data
        parsed_data = {}

        # Regular expressions to match the required fields
        pay_stub_patterns = {
            'Company Name': r'^(.+?) Pay Stub for:',
            'Employee Name': r'Pay Stub for: (.+?) Pay Period:',
            'Pay Period': r'Pay Period: (.+?) Pay Date:',
            'Pay Date': r'Pay Date: (.+?) Employee ID:',
            'Employee ': r'Employee ID: (.+?) Employee Information:',
            'Customer ID': r'Customer ID: (\d+)',
            'Employee Address': r'Address: (.+?), Social Security',
            'Social_Security': r'Social Security Number: (XXX-XX-\d{4})'
        }

        # Apply regex patterns and store matches in the dictionary
        for key, pattern in pay_stub_patterns.items():
            match = re.search(pattern, pay_stub_text)
            if match:
                parsed_data[key] = match.group(1)
        return parsed_data

def create_structured_tables(tables):
    structured_tables = []
    for table in tables:
        row_count = table.get("row_count", 0)
        column_count = table.get("column_count", 0)
        cells = table.get("cells", [])
        
        # Initialize an empty table
        structured_table = [["" for _ in range(column_count)] for _ in range(row_count)]
        
        # Populate the table with cell content
        for cell in cells:
            row_index = cell.get("row_index", 0)
            column_index = cell.get("column_index", 0)
            content = cell.get("content", "")
            structured_table[row_index][column_index] = content
        
        structured_tables.append(structured_table)
    
    return structured_tables

def tables_to_dataframes(structured_tables):
    dataframes = []
    for table in structured_tables:
        df = pd.DataFrame(table)
        dataframes.append(df)
    return dataframes



cleaned_data = clean_form_recognizer_result(paystub)
dataframes = tables_to_dataframes(cleaned_data["structured_tables"])

structured_data = {
    "pay stub details": parse_pay_stub(cleaned_data["plain_text_content"]),
}


df_list = []

def process_dataframe(df):
    result = {}
    columns = df.columns[1:]  # Ignore the first column
    for i in range(1, len(df)):  # Ignore the first row
        row_name = df.iloc[i, 0]
        result[row_name] = {}
        for col in columns:
            result[row_name][col] = f"{row_name} {col}: {df.at[i, col]}"
    return result

def rename_json_attributes(json_obj, attribute_titles):
    """
    Rename the keys of a JSON object based on the provided attribute titles.

    Parameters:
    json_obj (dict): The JSON object to rename.
    attribute_titles (dict): A dictionary where keys are the current attribute names and values are the new attribute names.

    Returns:
    dict: The updated JSON object with renamed keys.
    """
    updated_json = {}
    for old_key, new_key in attribute_titles.items():
        if old_key in json_obj:
            updated_json[new_key] = json_obj[old_key]
        else:
            updated_json[old_key] = json_obj.get(old_key, None)
    return updated_json

attribute_titles_earnings = {
    "1": "Hours Worked",
    "2": "Rate",
    "3": "Current Earnings",
    "4": "Year-to-Date Earnings"
}

attribute_titles_deductions = {
    "1": "Current Amount",
    "2": "Year-to-Date Amount"
}

# Process the earnings and deductions DataFrames
earnings_dict = process_dataframe(dataframes[0])
deductions_dict = process_dataframe(dataframes[1])
# Append the processed DataFrames to the JSON structure
structured_data["earnings"] = earnings_dict
structured_data["deductions"] = deductions_dict

def clean_pay_stub_section(data):
    # Check for 'deductions' and 'earnings' in the data
    for section in ['deductions', 'earnings']:
        if section in data:
            for key, values in data[section].items():
                # For each entry, clean up the values by removing everything before the colon
                for subkey in values:
                    # Split the string by colon and take the second part, stripping whitespace
                    values[subkey] = values[subkey].split(":")[1].strip()
    return data

structured_data = clean_pay_stub_section(structured_data)


def update_attribute_keys(data, section, key_mapping):
    # Ensure the section exists in the data (either "earnings" or "deductions")
    if section in data:
        # Iterate over each type within the earnings or deductions section
        for entry_type, attributes in data[section].items():
            # Create a new dictionary to store the updated attributes
            updated_attributes = {}
            
            # Loop through each attribute in that entry (e.g. 1, 2, 3)
            for old_key, value in attributes.items():
                # Map the old key (which is an integer) to the new descriptive key using key_mapping
                if str(old_key) in key_mapping:  # Convert old_key to string to match the mapping
                    new_key = key_mapping[str(old_key)]
                else:
                    new_key = old_key  # If no mapping is found, retain the old key
                
                # Update the dictionary with the new key
                updated_attributes[new_key] = value

            # Replace the old attributes with the updated attributes in the data
            data[section][entry_type] = updated_attributes

    return data


paystub_final = update_attribute_keys(structured_data, "earnings", attribute_titles_earnings)
paystub_final = structured_data = update_attribute_keys(structured_data, "deductions", attribute_titles_deductions)


# Save the updated JSON structure back to the file
with open('paystub_details.json', 'w') as json_file:
    json.dump(paystub_final, json_file, indent=4)

print("JSON file updated successfully.")

JSON file updated successfully.


## Loan Agreement


Now we get to the last part of our logical set of documents on a loan application process: the final loan agreement contract has been created and signed. A loan agreement contract is a legally binding document between a lender and a borrower that outlines the terms and conditions of a loan. This contract specifies the loan amount, interest rate, repayment schedule, and any other obligations or rights of both parties. It is crucial as it provides clarity and protection for both the lender and the borrower, ensuring that both parties understand their responsibilities and the consequences of default. Additionally, it serves as a legal record that can be referenced in case of disputes, helping to prevent misunderstandings and enforce the agreed-upon terms.

The format of a loan agreement is, on its core, a text document that will not have a fixed structure. We should expect just as an input text document and therefore retrieve it as such. 

As we did in the previous steps, let's call the function that will retrieve the information inside the loanagreements folder, retrieving, once again, one single Loan Agreement.


#### Loan Agreements

In [162]:
loanagreement = read_json_files_from_blob("loanagreements")

In [163]:
import json
import re

def clean_json_data(json_data):
    # Extract relevant text content from the JSON
    content = []

    # Extract text from paragraphs
    paragraphs = json_data.get("paragraphs", [])
    for paragraph in paragraphs:
        content.append(paragraph.get("text", "").strip())

    # Extract text from pages and lines
    pages = json_data.get("pages", [])
    for page in pages:
        for line in page.get("lines", []):
            content.append(line.get("text", "").strip())

    # Join all text content into a single string with spaces between components
    plain_text_content = " ".join(content)

    # Extract Customer ID using regex
    pattern = r"Customer ID:\s*(\d+)"
    match = re.search(pattern, plain_text_content)
    customer_id = match.group(1) if match else None
    return plain_text_content, customer_id

# Clean the JSON data and extract Customer ID
loanagreement_structured, customer_id = clean_json_data(loanagreement)

# Print the cleaned data and Customer ID
print(json.dumps(loanagreement_structured, indent=2))
print(f"Customer ID: {customer_id}")

"LOAN AGREEMENT This Loan Agreement (\"Agreement\") is made and entered into on August 1, 2024, by and between: \u00b7 Lender: Horizon Bank Address: 123 Finance Avenue, Madison, WI 53703 Contact Number: (555) 123-4567 Email: lending@horizonbank.com \u00b7 Borrower: Jane Elizabeth Smith Customer ID: 100002 Address: 456 Oak Avenue, Unit 10, Madison, WI 53703 Contact Number: (555) 234-5678 Email: jane.smith90@example.com 1. Loan Amount and Purpose 1.1 Loan Amount: The Lender agrees to loan the Borrower the principal sum of $30,000.00 (thirty thousand dollars), referred to as the \"Loan.\" 1.2 Purpose of Loan: The Loan shall be used exclusively for the purchase of a vehicle, specifically a 2022 Toyota Camry. 2. Interest Rate 2.1 Interest Rate: The Loan shall bear interest at an annual fixed rate of 5.5%. 2.2 Accrual: Interest shall begin to accrue on the Loan from the date the funds are disbursed to the Borrower. 3. Loan Term 3.1 Term: The term of this Loan shall be 5 years (60 months), co

# Challenge 03: Data Architecturing: From Retrieval to Upload (2/2)

## Upload to Cosmos - Loan Forms

In [164]:
import pandas as pd
import json
from azure.cosmos import CosmosClient, exceptions, PartitionKey
from dotenv import load_dotenv
import os
import uuid

# Load environment variables from .env file
load_dotenv()

endpoint = os.getenv("COSMOS_DB_ENDPOINT")
key = os.getenv("COSMOS_DB_KEY")
database_name = os.getenv("COSMOS_DB_DATABASE_NAME")
container_name = os.getenv("COSMOS_DB_CONTAINER_NAME")
file_name = "data2"

def upload_dataframe_to_cosmos_db(df, file_name, endpoint, key, database_name, container_name, partition_key_column):
    # Check if DataFrame is empty
    if df.empty:
        print("The DataFrame is empty. No data to upload.")
        return
    
    # Check if partition key column exists in DataFrame
    if partition_key_column not in df.columns:
        print(f"The partition key column '{partition_key_column}' does not exist in the DataFrame.")
        return
    
    # Initialize the Cosmos client
    client = CosmosClient(endpoint, key)
    
    try:
        # Create or get the database
        database = client.create_database_if_not_exists(id=database_name)
        
        # Create or get the container
        container = database.create_container_if_not_exists(
            id=container_name,
            partition_key=PartitionKey(path=f"/{partition_key_column}"),
            offer_throughput=400
        )
    except exceptions.CosmosHttpResponseError as e:
        print(f"An error occurred while creating the database or container: {e.message}")
        return
    
    # Convert DataFrame to JSON string
    json_content = df.to_json(orient='records')
    
    # Parse the JSON string
    json_data = json.loads(json_content)
    
    # Create documents with the JSON content and partition key
    for record in json_data:
        document = {
            'id': str(uuid.uuid4()),  # Generate a unique ID for each document
            'content': record,
            partition_key_column: record[partition_key_column]
        }
        
        # Upload the document to the container
        try:
            container.create_item(body=document)
            print(f"JSON content uploaded successfully with ID '{document['id']}' in Cosmos DB.")
        except exceptions.CosmosHttpResponseError as e:
            print(f"An error occurred while uploading the document: {e.message}")

# Example usage
upload_dataframe_to_cosmos_db(loanform_structured, "loanform1", endpoint, key, database_name, container_name, loanform_structured.columns[0])

An error occurred while creating the database or container: (BadRequest) Message: {"Errors":["The partition key component definition path '\/Customer ID' could not be accepted, failed near position '10'. Partition key paths must contain only valid characters and not contain a trailing slash or wildcard character."]}
ActivityId: 3b2de5fa-aee4-4a87-bfb7-2ebcb3b9b20e, Request URI: /apps/cef05ddd-1219-4cd3-b3f4-ff8fbe366a96/services/99db3e94-a0c1-43ec-89f9-86b2578f6f45/partitions/e9c36b19-24ea-42a3-bd30-571c81ea02e5/replicas/133691424048599213p, RequestStats: , SDK: Microsoft.Azure.Documents.Common/2.14.0
Code: BadRequest
Message: Message: {"Errors":["The partition key component definition path '\/Customer ID' could not be accepted, failed near position '10'. Partition key paths must contain only valid characters and not contain a trailing slash or wildcard character."]}
ActivityId: 3b2de5fa-aee4-4a87-bfb7-2ebcb3b9b20e, Request URI: /apps/cef05ddd-1219-4cd3-b3f4-ff8fbe366a96/services/99db3

## Upload to Cosmos - Loan Agreements

In [165]:
from azure.cosmos import CosmosClient, exceptions, PartitionKey
from dotenv import load_dotenv
import os
import uuid

# Load environment variables from .env file
load_dotenv()

# Cosmos DB connection details from environment variables
endpoint = os.getenv("COSMOS_DB_ENDPOINT")
key = os.getenv("COSMOS_DB_KEY")
database_name = os.getenv("COSMOS_DB_DATABASE")
container_name = os.getenv("COSMOS_DB_CONTAINER")

def upload_text_to_cosmos_db(text_content, partition_key_value):
    # Check if the text is empty
    if not text_content:
        print("The text content is empty. No data to upload.")
        return
    
    # Initialize the Cosmos client
    client = CosmosClient(endpoint, key)
    
    try:
        # Create or get the database
        database = client.create_database_if_not_exists(id=database_name)
        
        # Create or get the container
        container = database.create_container_if_not_exists(
            id=container_name,
            partition_key=PartitionKey(path=f"/{partition_key_value}"),
            offer_throughput=400
        )
    except exceptions.CosmosHttpResponseError as e:
        print(f"An error occurred while creating the database or container: {e.message}")
        return
    
    # Create a document with the text content and partition key
    document = {
        'id': str(customer_id),  # Generate a unique ID for the document
        'content': text_content,  # Store the plain text as 'content'
        partition_key_value: partition_key_value  # Use the partition key value
    }
    
    # Upload the document to the container
    try:
        container.create_item(body=document)
        print(f"Text content uploaded successfully with ID '{document['id']}' in Cosmos DB.")
    except exceptions.CosmosHttpResponseError as e:
        print(f"An error occurred while uploading the document: {e.message}")

# Pass in the plain text and a partition key value
upload_text_to_cosmos_db(loanagreement_structured, customer_id)

An error occurred while creating the database or container: (BadRequest) Message: {"Errors":["The input name 'null' is invalid. Ensure to provide a unique non-empty string less than '255' characters.","The request payload is invalid. Ensure to provide a valid request payload."]}
ActivityId: c40737f9-0b68-4a4e-8c75-34d942f2465f, Request URI: /apps/cef05ddd-1219-4cd3-b3f4-ff8fbe366a96/services/99db3e94-a0c1-43ec-89f9-86b2578f6f45/partitions/e9c36b19-24ea-42a3-bd30-571c81ea02e5/replicas/133691424048599213p, RequestStats: 
RequestStartTime: 2024-09-08T12:52:08.6721927Z, RequestEndTime: 2024-09-08T12:52:08.6738012Z,  Number of regions attempted:1
{"systemHistory":[{"dateUtc":"2024-09-08T12:51:12.5417569Z","cpu":0.049,"memory":660240992.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0742,"availableThreads":32765,"minThreads":64,"maxThreads":32767},"numberOfOpenTcpConnection":219},{"dateUtc":"2024-09-08T12:51:22.5517950Z","cpu":0.039,"memory":660224896.000,"threadInfo