<div id="singlestore-header" style="display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">InsureGPT: OpenAI and SingleStore for Insurance</h1>
    </div>
</div>

# InsureGPT Notebook Introduction

Welcome to the InsureGPT Jupyter Notebook, an innovative solution designed to revolutionize the way insurance claims, particularly car accident claims, are processed and analyzed. This notebook is a showcase of integrating cutting-edge technologies such as SingleStore, OpenAI, and LangChain to automate and enhance the accuracy of insurance claim analysis and document processing.

**Key Components:**

- **SingleStore:** A highly scalable, SQL-based, distributed database that excels in real-time analytics and is used here to store and manage the vectorized representation of documents. This facilitates rapid access and analysis, enabling more efficient retrieval of insurance claim details and supporting documents.

- **OpenAI:** At the heart of InsureGPT is the OpenAI API, leveraging models like GPT-4 for text and GPT-4-Vision for images. These models are utilized for a range of tasks, from analyzing images of car accidents to interpret and describe the damage, to processing various documents related to the claim. The OpenAI models help in understanding complex text and image data, providing insights that assist in making informed decisions on claims.

- **LangChain:** This framework is used to streamline the handling of language-related tasks, including document loading, text splitting, and embedding generation. LangChain's utilities support the efficient preparation and processing of large documents and texts, making them more manageable for analysis by machine learning models.

**Workflow Overview:**

1. **Document and Image Analysis:** The notebook begins with analyzing images related to car accidents and documents such as car repair estimates and insurance claims. Using OpenAI's GPT-4-Vision and GPT-4 models, it extracts and interprets detailed information from these sources.

2. **Data Processing:** The extracted information is then processed, which includes encoding images to base64 for analysis, text extraction from images, and detailed report generation based on the interpreted data.

3. **Vectorization and Storage:** With the help of LangChain, the textual data is split into manageable chunks and vectorized using embeddings. These vectorized documents are stored in SingleStore, enabling efficient querying and retrieval for analysis and reference.

4. **Report Generation:** The final step involves compiling the analyzed data into a comprehensive report that outlines the accident details, repair estimates, and insurance claim insights. This report can be used by insurance agents to make informed decisions regarding claim payouts.

This notebook represents a practical application of AI in the insurance industry, offering a glimpse into the future of automated and data-driven claim processing. By leveraging the power of SingleStore, OpenAI, and LangChain, InsureGPT streamlines the analysis of complex data, making the claim handling process faster, more accurate, and more efficient.

In [1]:
!pip install FPDF langchain langchain-community langchain-core SingleStoreDB openai

In this section of the notebook, we prepare our environment by importing a set of Python libraries essential for processing and analyzing documents in the context of insurance claims. These libraries enable us to generate PDF documents, split text for easier processing, load and analyze text documents, and interact with OpenAI's powerful AI models for advanced text and image analysis. Additionally, we configure our environment to connect to a SingleStore database, a scalable SQL database that excels in real-time analytics, by setting the `SINGLESTOREDB_URL` environment variable. This setup is crucial for storing and retrieving vectorized document data efficiently. Finally, we initialize the OpenAI client with an API key, allowing us to leverage OpenAI's models for tasks such as analyzing images, processing text, and generating embeddings that can be stored in the SingleStore database for quick retrieval and analysis. This configuration forms the backbone of our document processing and analysis workflow, combining the capabilities of FPDF, LangChain, and OpenAI with the storage and retrieval efficiency of SingleStore.

In [2]:
from fpdf import FPDF
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OpenAIEmbeddings
from openai import OpenAI
import os
import base64
from langchain_community.vectorstores.singlestoredb import SingleStoreDB

os.environ["SINGLESTOREDB_URL"] = "<SINGLESTOREDB URL"
client = OpenAI(api_key="YOUR OPENAI KEY")

This section of the notebook defines a series of functions aimed at analyzing images and documents using OpenAI's API, and generating PDF reports based on the analysis. Here's a breakdown of each function:

- `encode_image(image_path)`: This function takes the path of an image file as input, reads the file in binary mode, encodes it into base64 format, and then decodes it to a UTF-8 string. This encoding is necessary for transmitting the image data in a format that can be embedded within JSON payloads for API requests.

- `analyze_image(base64_image, prompt)`: Utilizes the OpenAI API to analyze an image given as a base64-encoded string. It sends a prompt alongside the image to the API, instructing it on the specific analysis or description needed for the image. The function then returns the API's textual response, which is an analysis or description of the image based on the prompt.

- `analyze_document(base64_image)`: Similar to `analyze_image`, this function sends a base64-encoded image to the OpenAI API for analysis. However, it specifically asks the API to treat the image as a document, extracting and converting the image's text content into formatted text output. This is particularly useful for processing scanned documents or images containing textual information.

- `analyze_accident(accident_report, car_repair_details, insurance_claim)`: This function represents a more complex analysis scenario where multiple pieces of information (e.g., an accident report, car repair details, and an insurance claim document) are synthesized to infer the type of damage and estimate repair costs. It showcases the potential for using OpenAI's API to perform detailed analysis and decision-making tasks based on textual data.

- `create_pdf(text, filename="Accident_Analysis_Report.pdf")`: Generates a PDF document containing the specified text. This function initializes a PDF document, sets the font, adds the text content as a multi-cell (which automatically handles text wrapping), and saves the document with the given filename. This is useful for creating formal reports from the analyzed data, such as accident analysis reports that can be shared or archived.

Each function leverages the capabilities of the OpenAI API to process and analyze image and text data in various forms, demonstrating the potential for automating tasks that traditionally require manual effort, such as document analysis, accident report generation, and insurance claim processing.

In [3]:
# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')



def analyze_image(base64_image, prompt):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    response_text = response.choices[0].message.content
    return response_text


def analyze_document(base64_image):
    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "system",
                "content": "You are a scribe. Convert this image text to formatted text output. Output nothing else.",
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    response_text = response.choices[0].message.content
    return response_text


def analyze_accident(accident_report, car_repair_details, insurance_claim):
    response = client.chat.completions.create(
        model="gpt-4-0125-preview",
        messages=[
                     {
                         "role": "system",
                         "content": """You are a car insurance agent. Your job is to analyze the following documents
                                    including a insurance claim document, a detailed description of the damage,
                                    and a CSV of the car parts and prices. Your goal is to use this information and
                                    infer the type of damage and make a payout estimate based on the CSV.""",
                     },
                 ]
                 + [{"role": "assistant", "content": accident_report}]
                 + [{"role": "assistant", "content": car_repair_details}]
                 + [{"role": "assistant", "content": insurance_claim}]

    )

    response_text = response.choices[0].message.content
    return response_text


def create_pdf(text, filename="Accident_Analysis_Report.pdf"):
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.multi_cell(0, 10, text)
    pdf.output(filename)

This final section of the notebook demonstrates the complete workflow of processing and analyzing a car accident scenario, from initial image analysis to generating a final accident analysis report, and then storing the processed documents in a vectorized format for efficient retrieval.

1. **Image Encoding and Analysis:**
    - `car_accident_image = encode_image("car-accident.jpg")`: Encodes the car accident image to base64 format for analysis.
    - `accident_report = analyze_image(...)`: Analyzes the encoded image to generate a detailed report on the car accident, describing the visible damage.

2. **Document Analysis for Repair Details and Insurance Claim:**
    - Images of car repair estimates and the insurance claim are encoded and analyzed to extract text-based information. This step converts visual data (scanned documents or images) into structured text data that can be further analyzed or stored.

3. **Accident Analysis:**
    - Combining the accident report, car repair details, and insurance claim document, a comprehensive analysis is performed to infer the type of damage and estimate repair costs.

4. **Report Generation (Optional):**
    - The `create_pdf` function is commented out, but it can be used to generate a PDF report of the final accident analysis, suitable for sharing or archiving.

5. **Storing Documents in Vectorized Format:**
    - The workflow concludes by loading an insurance claim document (`insurance_claim.txt`), splitting it into manageable chunks, generating embeddings for each chunk, and storing these vectorized documents in a SingleStore database. This process facilitates efficient retrieval and analysis of the documents later, using techniques like Retrieval-Augmented Generation (RAG) or Approximate Nearest Neighbor (ANN) search. The `SingleStoreDB.from_documents` function showcases how text data, once processed and vectorized, can be organized in a database, leveraging SingleStore's capabilities for real-time analytics.

This section encapsulates the end-to-end process of leveraging AI and machine learning for automating the analysis of car accident reports, repair estimates, and insurance claims. It illustrates the integration of image and text analysis with data storage solutions to streamline and enhance the efficiency of insurance claim processing.

In [4]:
car_accident_image = encode_image("car-accident.jpg")
accident_report = analyze_image(car_accident_image, "Visually describe this car accident. Explain the damage on "
                                                    "the car. Return a detailed accident report")

print(accident_report)

car_repair_details = analyze_document(encode_image("car_repair_estimate.png"))
print(car_repair_details)

insurance_claim = analyze_document(encode_image("insurance_claim.png"))
print(insurance_claim)

final_accident_analysis = analyze_accident(accident_report, car_repair_details, insurance_claim)
print(final_accident_analysis)
# create_pdf(final_accident_analysis) # if you want to create a PDF out of the final accident analysis

# store documents in vectorized format for RAG, quick search (ANN), etc.
loader = TextLoader("insurance_claim.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()

db = SingleStoreDB.from_documents(
    docs,
    embeddings,
    table_name="insurance_claim",  # use table with a custom name
)

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>