<a href="https://colab.research.google.com/github/sushantagarwal29/ragpoc/blob/main/llamaparse_ExtractPOC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Parsing Complex PDFs with LlamaParse

##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).

> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

PDFs and other complex document types are notoriously difficult to work with, yet are the common file formats used for publishing important business related information. Since these file types are so common, it is key to have the capability to parse and ingest these documents swiftly, with accuracy, while cleanly extracting embedded entities such as images, tables, and graphs. If extracted correctly, all of the data held in a complex document like a PDF can be ingested into a RAG workflow to generate accurate and contextual responses for users and the business.

This sample will illustrate how to use LlamaParse, an generative AI enabled parsing platform created by LlamaIndex to parse and represent complex files in a way that enables effective retrieval. We will use LlamaIndex to orchestrate a RAG pipeline where LlamaParse is used to parse a complex academic article and extract text and tables from it, and KDB.AI is used as our retrieval mechanism to pass relevant information about the article to an LLM.

LlamaParse transforms complex documents like PDFs into markdown or text formats, which are easily ingestible. This parsing also extracts embedded entities like tables and images.

Agenda:
1. Dependencies, Imports & Setup
2. Set API Keys for LlamaCloud, OpenAI, Cohere
3. Define KDB.AI Session
4. Create Schema and KDB.AI Table
5. Download ARXIV Article: '[LLM In-Context Recall is Prompt Dependent](https://arxiv.org/pdf/2404.08865)' by Daniel Machlab and Rick Battle
6. LlamaParse & LlamaIndex Setup
7. Parse the Document with LlamaParse into Markdown Format
8. Extract Text and Table nodes from Markdown Document
9. Create the RAG Pipeline with LlamaIndex and KDB.AI
10. Query the RAG Pipeline!

## 1. Dependencies, Imports & Setup

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.


-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells.

In [None]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-embeddings-openai
!pip install llama-parse
!pip install llama-index-vector-stores-kdbai
!pip install pandas
!pip install llama-index-postprocessor-cohere-rerank
!pip install kdbai_client

Collecting llama-index
  Downloading llama_index-0.12.2-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.0-py3-none-any.whl.metadata (726 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_cli-0.4.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.2 (from llama-index)
  Downloading llama_index_core-0.12.2-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.3-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post4-py3-none-any.whl.metadata (8.5 kB)
Collecting 

In [None]:
!pip install -U llama-index-llms-azure-inference
!pip install -U llama-index-embeddings-azure-inference

Collecting llama-index-llms-azure-inference
  Downloading llama_index_llms_azure_inference-0.3.0-py3-none-any.whl.metadata (1.7 kB)
Collecting azure-ai-inference>=1.0.0b5 (from llama-index-llms-azure-inference)
  Downloading azure_ai_inference-1.0.0b6-py3-none-any.whl.metadata (31 kB)
Collecting azure-identity<2.0.0,>=1.15.0 (from llama-index-llms-azure-inference)
  Downloading azure_identity-1.19.0-py3-none-any.whl.metadata (80 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.6/80.6 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting isodate>=0.6.1 (from azure-ai-inference>=1.0.0b5->llama-index-llms-azure-inference)
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Collecting azure-core>=1.30.0 (from azure-ai-inference>=1.0.0b5->llama-index-llms-azure-inference)
  Downloading azure_core-1.32.0-py3-none-any.whl.metadata (39 kB)
Collecting msal>=1.30.0 (from azure-identity<2.0.0,>=1.15.0->llama-index-llms-azure-inference)
  Downloading msal-1.31.1-

In [None]:
from llama_index.llms.azure_inference import AzureAICompletionsModel

In [None]:
from llama_index.embeddings.azure_inference import AzureAIEmbeddingsModel

In [None]:
from llama_parse import LlamaParse
from llama_index.core import Settings
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import MarkdownElementNodeParser
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.kdbai import KDBAIVectorStore
from llama_index.postprocessor.cohere_rerank import CohereRerank
from getpass import getpass
import os
import kdbai_client as kdbai


## 2. Set API Keys for LlamaCloud, OpenAI, Cohere
Get API keys here:
- [LlamaCloud](https://cloud.llamaindex.ai/)
- [OpenAI](https://platform.openai.com/api-keys)
- [Cohere](https://dashboard.cohere.com/welcome/register)

In [None]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [None]:
# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = (
    os.environ["LLAMA_CLOUD_API_KEY"]
    if "LLAMA_CLOUD_API_KEY" in os.environ
    else getpass("LLAMA CLOUD API key: ")
)

LLAMA CLOUD API key: ··········


In [None]:
# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY1" in os.environ
    else getpass("OpenAI API Key: ")
)

OpenAI API Key: ··········


In [None]:
# Using Cohere for reranking
os.environ["COHERE_API_KEY"] = (
    os.environ["COHERE_API_KEY"]
    if "COHERE_API_KEY" in os.environ
    else getpass("COHERE API key: ")
)

COHERE API key: ··········


## 3. Define KDB.AI Session
KDB.AI comes in two offerings:

KDB.AI Cloud - For experimenting with smaller generative AI projects with a vector database in our cloud.
KDB.AI Server - For evaluating large scale generative AI applications on-premises or on your own cloud provider.
Depending on which you use there will be different setup steps and connection details required.

Option 1. KDB.AI Cloud
To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free here.

You can connect to a KDB.AI Cloud session using kdbai.Session and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables KDBAI_ENDPOINTS and KDBAI_API_KEY exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect. If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

### Option 1. KDB.AI Cloud

In [None]:
#Set up KDB.AI endpoint and API key
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

KDB.AI endpoint: https://cloud.kdb.ai/instance/5mpjbggrkg
KDB.AI API key: ··········


In [None]:
#connect to KDB.AI
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

### Option 2. KDB.AI Server
To use KDB.AI Server, you will need download and run your own container. To do this, you will first need to sign up for free here.

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the setup steps are complete you can then connect to your KDB.AI Server session using kdbai.Session and passing your local endpoint.

In [None]:
# session = kdbai.Session(endpoint="http://localhost:8082")

## 4. Create Schema and KDB.AI Table

In [None]:
schema = [
        dict(name="document_id", type="bytes"),
        dict(name="text", type="bytes"),
        dict(name="embeddings", type="float32s"),
    ]

indexFlat = {
        "name": "flat",
        "type": "flat",
        "column": "embeddings",
        "params": {'dims': 1536, 'metric': 'L2'},
    }

In [None]:
# Connect with kdbai database
db = session.database("default")

In [None]:
KDBAI_TABLE_NAME = "LlamaParse_Table"

# First ensure the table does not already exist
try:
    db.table(KDBAI_TABLE_NAME).drop()
except kdbai.KDBAIException:
    pass

#Create the table
table = db.create_table(KDBAI_TABLE_NAME, schema, indexes=[indexFlat])

## 5. Download ARXIV Article
This is an article from VMware NLP Lab called '[LLM In-Context Recall is Prompt Dependent](https://arxiv.org/pdf/2404.08865)' by Daniel Machlab and Rick Battle

In [None]:
!wget 'https://arxiv.org/pdf/2404.08865' -O './LLM_recall.pdf'

--2024-12-03 12:47:21--  https://arxiv.org/pdf/2404.08865
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.195.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4601949 (4.4M) [application/pdf]
Saving to: ‘./LLM_recall.pdf’


2024-12-03 12:47:21 (46.7 MB/s) - ‘./LLM_recall.pdf’ saved [4601949/4601949]



## 6. LlamaParse & LlamaIndex Setup
We define which LLM and embedding model should be used, define the file path of the complex document, and create parsing instructions.

Using Open AI LLM & Embedding models via Azure foundry

In [None]:
llm = AzureAICompletionsModel(
    endpoint="https://ai-depoc1aihub1643128651037.openai.azure.com/openai/deployments/gpt-4o",
    credential="EcpnZTlxYcDNtgfui7mWYS0JeTWJPlNXJEaILe2fwF6dftfTBqYLJQQJ99ALACYeBjFXJ3w3AAAAACOGXsqG",
    api_version="2024-08-01-preview",
)

In [None]:
embed_model = AzureAIEmbeddingsModel(
    endpoint="https://ai-depoc1aihub1643128651037.openai.azure.com/openai/deployments/text-embedding-3-small",
    credential="EcpnZTlxYcDNtgfui7mWYS0JeTWJPlNXJEaILe2fwF6dftfTBqYLJQQJ99ALACYeBjFXJ3w3AAAAACOGXsqG",
    model_name="text-embedding-3-small",
)

In [None]:
Settings.llm = llm
Settings.embed_model = embed_model

Using Open AI LLMs

In [None]:
EMBEDDING_MODEL  = "text-embedding-3-large"
GENERATION_MODEL = "gpt-4o-mini"

#llm = OpenAI(model=GENERATION_MODEL)
embed_model = OpenAIEmbedding(model=EMBEDDING_MODEL)

Settings.llm = llm
Settings.embed_model = embed_model

In [None]:
pdf_file_name = './1.3.2.6.2_Honeywell Long Term Contract_C14.pdf'

In [None]:
parsing_instructions = '''The document attached are legal and long term agreement contracts. Answer questions using the information in this article and be precise.'''

## 7. Parse the document with LlamaParse into markdown format

In [None]:
documents = LlamaParse(result_type="markdown", disable_ocr=True, target_pages="0,2,6",parsing_instructions=parsing_instructions).load_data(pdf_file_name)

Started parsing the file under job_id 893867ba-a0c1-4453-ba57-2e0bc29ec19e
..

In [None]:
print(documents[0].text[:1000])

# Honeywell

# THE POWER OF CONNECTED

# STAND-ALONE GOVERNMENT PROGRAM CONTRACT

BETWEEN HONEYWELL INTERNATIONAL INC.

AND

UNITED AVIONICS INC.

# STAND-ALONE GOVERNMENT PROGRAM CONTRACT: DEF10177

PROGRAM: TIGER Ill

PRIME CONTRACT NUMBER: W56HZV-20-D-0062

PERIOD OF PERFORMANCE: 10/1/2020 - 9/30/2025

CONTRACT TYPE: FFP

..L_Honeywell

~;; ,,.:

~Supplier

(v01-2021) Stand-Alone Government Program Contract

Honeywell Confidential


## 8. Extract Text and Table nodes from Markdown Document

In [None]:
# Parse the documents using MarkdownElementNodeParser
node_parser = MarkdownElementNodeParser(llm=llm, num_workers=8).from_defaults()

In [None]:
# Retrieve nodes (text) and objects (table)
nodes = node_parser.get_nodes_from_documents(documents)

0it [00:00, ?it/s]
0it [00:00, ?it/s]
4it [00:00, 25771.45it/s]


#### Split nodes into base_nodes (text nodes), and object (table nodes)

In [None]:
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

#### Explore these extracted nodes

In [None]:
print(base_nodes[6].text[:])

Option Year 3 (October 1, 2024 - September 30, 2025 Delivery Dates) - to be exercised by Honeywell upon award by USG


In [None]:
print(objects)

[IndexNode(id_='219050ab-c33c-4f3a-81fb-c2dfee830252', embedding=None, metadata={'col_schema': 'Column: Part Number\nType: string\nSummary: Unique identifier for each part.\n\nColumn: Description\nType: string\nSummary: Brief description of the part.\n\nColumn: Lead Time\nType: integer\nSummary: Time required to deliver the part (in days).\n\nColumn: Capacity\nType: integer\nSummary: Maximum number of parts that can be produced.\n\nColumn: Quantity\nType: integer\nSummary: Number of parts ordered.\n\nColumn: Unit Price\nType: string\nSummary: Price per unit of the part.\n\nColumn: Extended Price\nType: string\nSummary: Total price for the ordered quantity.\n\nColumn: Item\nType: string\nSummary: Indicates if the item is included in the order.\n\nColumn: Award\nType: string\nSummary: Indicates if the item has been awarded.\n\nColumn: Truthful Cost Data Applies\nType: string\nSummary: Indicates if truthful cost data applies to the item.'}, excluded_embed_metadata_keys=['col_schema'], exc

In [None]:
print(objects[3].obj.text[:])

This table provides details about various electrical harness assemblies, including their part numbers, descriptions, lead times, capacities, quantities, unit prices, extended prices, and other relevant information.,
with the following columns:
- Part Number: Unique identifier for each harness assembly.
- Description: Description of the harness assembly.
- Lead Time: Time required to deliver the harness assembly (in days).
- Capacity: Production capacity for the harness assembly.
- Quantity: Number of units ordered.
- Unit Price: Price per unit of the harness assembly.
- Extended Price: Total price for the ordered quantity.
- Item: Indicates if the item is included in the order.
- Award: Indicates if the item has been awarded.
- Truthful Cost Data Applies: Indicates if truthful cost data applies to the item.

|Part Number|Description|Lead Time|Capacity|Quantity|Unit Price|Extended Price|Item|Award|Truthful Cost Data Applies|
|---|---|---|---|---|---|---|---|---|---|
|3-300-952-01|HARNES

## 9. Create the RAG Pipeline with LlamaIndex and KDB.AI

Use KDB.AI as the vector store, insert base_nodes and objects into KDB.AI, create query_engine using Cohere for reranking.

In [None]:
vector_store = KDBAIVectorStore(table)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [None]:
#Create the index, inserts base_nodes and objects into KDB.AI
recursive_index = VectorStoreIndex(
    nodes= base_nodes + objects, storage_context=storage_context
    #,store_nodes_override=FALSE
)

In [None]:
# Query KDB.AI to ensure the nodes were inserted
table.query()

Unnamed: 0,document_id,text,embeddings
0,b'14d6d0f5-e1fc-4963-937a-95e5fbbc50de',b'Honeywell\n\n THE POWER OF CONNECTED\n\n STA...,"[-0.045199413, -0.054903533, 0.044602636, 0.02..."
1,b'a0d4464a-af0f-4630-8bba-fcd52d7ff9d4',b'Honeywell AEROSPACE SOURCING\n\n THE POWER O...,"[-0.03446311, -0.010401975, 0.079065524, 0.019..."
2,b'58731308-d45b-4f79-94cb-cb04864af76d',"b'""Military End Uses"" includes use of an item ...","[-0.0020543735, 0.035638258, 0.07917821, 0.022..."
3,b'b3b02b06-96a7-4575-a897-2297ce66e8db',b'Honeywell AEROSPACE SOURCING\n\n THE POWER O...,"[-0.020578463, -0.029511936, 0.086429544, -0.0..."
4,b'7b063cbd-6aea-4123-8143-085620e43062',"b'Option Year 1 (October 1, 2022 - September 3...","[-0.043066006, 0.040384337, 0.04064985, 0.0292..."
5,b'8b6f9136-fe47-4b2a-94d8-e422e6c205a8',"b'Option Year 2 (October 1, 2023 - September 3...","[-0.038107604, 0.04094315, 0.036888584, 0.0136..."
6,b'feccfb03-1b35-48c6-84e5-619db04a4885',"b'Option Year 3 (October 1, 2024 - September 3...","[-0.043578796, 0.045119997, 0.04349908, 0.0172..."
7,b'219050ab-c33c-4f3a-81fb-c2dfee830252',b'This table lists various electrical harness ...,"[-0.06300706, -0.02227639, 0.023460347, 0.0037..."
8,b'a6e619fc-c85f-4d2b-b4e5-a1976f176ff5',b'This table provides details on various elect...,"[-0.05525397, -0.024775168, 0.023196483, 0.006..."
9,b'1feadb61-fcde-4f7e-9fb8-db3d0e82efc8',b'This table lists various electrical harness ...,"[-0.06369985, -0.020310473, 0.01768434, -0.000..."


In [None]:
### Define reranker
cohere_rerank = CohereRerank(top_n=10)

### Create the query_engine to execute RAG pipeline using LlamaIndex, KDB.AI, and Cohere reranker
query_engine = recursive_index.as_query_engine(similarity_top_k=20,
                                               node_postprocessors=[cohere_rerank],
                                               vector_store_kwargs={
                                                    "index" : "flat",
                                                },
                                            )

## 10. Query the RAG Pipeline!
All the work is complete! Now we can ask questions about the article whether the information is contained in text, or in tables.

In [None]:
query_1 = """You are an AI assistant specialized in analyzing legal contracts and long term agreements.
Your task is to extract relevant information from a given contract document.
Your output must be a structured JSON object.

Instructions:
1. Carefully read the entire contract documents
2. Extract the relevant information.
3. Present your findings in JSON format as specified below.

Important Notes:
- Extract only relevant information.
- Consider the context of the entire contract when determining relevance.
- Do not be verbose, only respond with the correct format and information.
- Some docs may have multiple relevant excerpts -- include all that apply.
- Some questions may have no relevant excerpts -- just return ["N/A"].
- Do not include additional JSON keys beyond the ones listed here.
- Do not include the same key multiple times in the JSON.

Expected JSON keys and explanation of what they are:
- 'contract_end_date': The end date of the contract.
- 'item_identifier': Comman seperated list of the items in contract
- 'Party1': First Party name
- 'Party1_address': First Party adress
- 'Party2': Second Party name
- 'Party2_address': Second Party adress
- 'signing_date': The date the contract was signed.
- 'contract_start_date': The start date of the contract.
- 'term_of_payment': Description of the payment terms.
- 'contract_value': Value of contract if mentioned.
- 'contract_number': ID of contract.
- 'contract_type': Type of contract.
"""

response_1 = query_engine.query(query_1)

print(str(response_1))


```json
{
  "contract_end_date": "September 30, 2025",
  "item_identifier": "N/A",
  "Party1": "Honeywell International Inc.",
  "Party1_address": "1300 W Warner Rd, Tempe, AZ 85284",
  "Party2": "United Avionics Inc",
  "Party2_address": "38 Great Hill Rd, Naugatuck, CT 06770",
  "signing_date": "March 1, 2021",
  "contract_start_date": "October 1, 2020",
  "term_of_payment": "N/A",
  "contract_value": "N/A",
  "contract_number": "W56HZV-20-D-0062",
  "contract_type": "Stand-Alone Government Program Contract"
}
```


In [None]:
query_1 = """You are an AI assistant specialized in analyzing legal contracts and long term agreements.
Your task is to extract goods/part numbers/product and their related information from a given contract document.
Your output must be a structured JSON object.

Instructions:
1. Carefully read the entire contract documents
2. Extract the relevant information most probably represented in one or multiple tables.
3. Present your findings in JSON format as specified below.

Important Notes:
- Extract only relevant information.
- Consider the context of the entire contract when determining relevance.
- Do not be verbose, only respond with the correct format and information.
- Some docs may have multiple relevant excerpts -- include all that apply.
- Some questions may have no relevant excerpts -- just return ["N/A"].
- Do not include additional JSON keys beyond the ones listed here.
- Do not include the same key multiple times in the JSON.

Expected JSON keys and explanation of what they are:
- Part Number: Unique identifier for each component.
- Description: Brief description of the component.
- Lead Time: Time in days required to deliver the component.
- Capacity: Maximum production capacity for the component.
- Quantity: Number of units available.
- Unit Price: Price per individual unit of the component.
- Extended Price: Total price for the quantity available.
- Item: Indicates if the item is relevant.
- Award: Indicates if the item has been awarded.
- Truthful Cost Data Applies: Indicates if truthful cost data is applicable.
"""

response_1 = query_engine.query(query_1)

print(str(response_1))


```json
[
    {
        "Part Number": "N/A",
        "Description": "N/A",
        "Lead Time": "N/A",
        "Capacity": "N/A",
        "Quantity": "N/A",
        "Unit Price": "N/A",
        "Extended Price": "N/A",
        "Item": "N/A",
        "Award": "N/A",
        "Truthful Cost Data Applies": "N/A"
    }
]
```


In [None]:
query_1 = """You are an AI assistant specialized in analyzing legal contracts and long term agreements.
Your task is to extract goods/part numbers/product and their related information like quantity and prices from attachment 1 section on given contract document.
Can you present these details in tablular format.
"""

response_1 = query_engine.query(query_1)

print(str(response_1))


Based on the provided context, here is the extracted information from Attachment 1 of the contract document, presented in a tabular format:

| Part Number | Description | Lead Time (days) | Capacity | Quantity | Unit Price | Extended Price | Item | Award | Truthful Cost Data Applies |
|-------------|-------------|------------------|----------|----------|------------|----------------|------|-------|----------------------------|
| (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) | (Data not provided) |

Unfortunately, the specific details such as part numbers, descriptions, lead times, capacities, quantities, unit prices, extended prices, item inclusion, award status, and truthful cost data applicability are not explicitly provided in the given context. To present a complete table, the actual data from Attachment 1 would be required.


In [None]:
query_1 = """You are an AI assistant specialized in analyzing legal contracts and long term agreements.
Can you extract the table with part numbers/products/goods for ordering, production
"""

response_1 = query_engine.query(query_1)

print(str(response_1))


The context provided does not include the actual table with part numbers, descriptions, lead times, capacities, quantities, unit prices, extended prices, and other relevant information for ordering and production. It only describes the structure and content of such a table. 

To extract the table with part numbers/products/goods for ordering and production, you would need to refer to the specific attachment or section of the contract that contains this detailed information. The context mentions that this information is listed in TABLE 1 under the section "Production Prices" for the base years award (October 1, 2020 - September 30, 2022 Delivery Dates), but the actual table content is not provided in the text.

If you have access to the full document or the specific attachment, you should look for TABLE 1 or the section that lists the details about various electrical harness assemblies and related components. This table will include columns such as Part Number, Description, Lead Time, C

In [None]:
query_1 = """You are an assistant tasked with summarizing tables and text.
Convert the  table 1 into a readable text and display it in tabluar format
"""

response_1 = query_engine.query(query_1)

print(str(response_1))


Sure, I can convert the information from Table 1 into a readable text format and then display it in a tabular format. Here is the information:

### Readable Text Format:
Table 1 provides details about various electrical harness assemblies, including their part numbers, descriptions, lead times, capacities, quantities, unit prices, extended prices, and other relevant information. The columns in the table include:
- Part Number: Unique identifier for each part.
- Description: Brief description of the part.
- Lead Time: Time required to deliver the part (in days).
- Capacity: Maximum number of parts that can be produced.
- Quantity: Number of parts ordered.
- Unit Price: Price per unit of the part.
- Extended Price: Total price for the ordered quantity.
- Item: Indicates if the item is included in the order.
- Award: Indicates if the item has been awarded.
- Truthful Cost Data Applies: Indicates if truthful cost data applies to the item.

### Tabular Format:
| Part Number | Description | 

## Delete the KDB.AI Table

Once finished with the table, it is best practice to drop it.

In [None]:
table.drop()