## Step 3 Validation: Query Engine Multi-Stage QA with Intermediate Outputs
This notebook validates the multi-stage QA process and prints intermediate outputs:
 1. Query decomposition (sub-queries).
 2. Retrieval of documents for each sub-query.
 3. Synthesis of the final answer.
 4. Final summary with confidence rating.

The process uses LangSmith's Client for tracing, and the retriever benefits from earlier reranking.

In [2]:
import logging

from langsmith import Client, traceable

from config import RESOURCES_PATH
from document_loader import load_sop_documents
from llm import get_llm
from query_engine import QueryEngine, QueryResult
from query_engine import search_tables_for_answer
from retriever import setup_sop_retriever
from text_processing import dynamic_text_splitter

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize LangSmith Client for tracing.
client = Client()

In [3]:
@traceable(project_name="sop_validation", client=client)
def validate_query_processing(user_query: str, retriever_instance) -> QueryResult:
    """
    Validates query processing by searching both document text and structured table data.
    """
    # Instantiate the QueryEngine with structured table integration.
    query_engine_instance = QueryEngine(debug_mode=True, confidence_threshold=80)

    # Execute the query on unstructured text.
    result = query_engine_instance.query_documents_advanced(user_query, retriever_instance)

    # Initialize an empty table context to avoid UnboundLocalError
    table_context = ""

    # Extract structured tables from retrieved documents (only if metadata is available)
    table_results = []
    if result.source_documents:
        for doc in result.source_documents:
            if "tables" in doc:
                table_results.extend(search_tables_for_answer([doc], user_query))

    # If table results are found, integrate them into the final response.
    if table_results:
        table_context += "\nRelevant Table Data:\n"
        for doc_name, matches in table_results:
            table_context += f"\nDocument: {doc_name}\n"
            for param, value in matches:
                table_context += f"{param}: {value}\n"

    # Append structured table results to the LLM-generated answer, only if table_context is not empty
    if table_context:
        result.answer += f"\n\n{table_context}"

    return result


In [4]:
# Load SOP documents and split into chunks.
documents = load_sop_documents(RESOURCES_PATH)
document_chunks = dynamic_text_splitter(documents, default_chunk_size=500)
print(f"Loaded {len(documents)} document(s) and created {len(document_chunks)} chunks.")

[DEBUG] extract_metadata_from_docx() function started.
[DEBUG] Found 6 tables in the document.
[DEBUG] Processing Table 1

[DEBUG] Table 1 Extracted Text:
AG SOLUTION MANUFACTURING PROCEDURE Review: 9 Date: 08/02/2021 | CONFIDENTIAL PRODUCT: ALKYLBENZEN SULFONIC ACID (11027563)

[MATCH] Found Review: 9
[MATCH] Found Date: 08/02/2021
[DEBUG] Processing Table 2

[DEBUG] Table 2 Extracted Text:
PLANT : | SULFAX |  |  | 
REACTOR : | R503 |  |  | 

[DEBUG] Processing Table 3

[DEBUG] Table 3 Extracted Text:
HAZARD CLASSIFICATION OF THE FINAL PRODUCT: | HAZARD CLASSIFICATION OF THE FINAL PRODUCT: | H302- Harmful if swallowed. H314-Causes severe skin burns and serious eye damage. | H302- Harmful if swallowed. H314-Causes severe skin burns and serious eye damage. | H302- Harmful if swallowed. H314-Causes severe skin burns and serious eye damage.
MANUFACTURING PROCEDURE | MANUFACTURING PROCEDURE | MANUFACTURING PROCEDURE | MANUFACTURING PROCEDURE | MANUFACTURING PROCEDURE
SAP CODE | DESCRIPTION

In [4]:
# Choose the model for the Query Engine.
model_choice = "azure"  # Change to "ollama" or "qwen2.5" as needed.
llm_instance = get_llm(model_choice)

INFO:azure.identity._credentials.environment:Incomplete environment configuration for EnvironmentCredential. These variables are set: AZURE_TENANT_ID
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.20.0 Python/3.12.3 (Windows-11-10.0.26100-SP0)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://management.azure.com/subscriptions/b98e9951-860f-464a-a9a2-f69802ca8721/resourceGroups/ai_llm/providers/Microsoft.MachineLearningServices/workspaces/agentic_rag/connections?api-version=REDACTED&category=REDACTED&includeAll=REDACTED'
Request method: 'G

In [5]:
# Set up the retriever from the SOP chunks.
retriever_instance = setup_sop_retriever(document_chunks)
queries = [
    # "When synthesizing Alkylbenzen sulfonic acid, which should be the setpoint for the sulfur trioxide when doing the sulfonation?",
    "Which range of humidity values are acceptable for the Alkylbenzen Sulfonic Acid?",
    "Describe the hazard classification of the AN-84 product.",
    # "Which are the raw materials used for the production of AS-42? How much of each raw material should be used for the production of 1 Ton of AS-42?"
]

INFO:azure.identity._credentials.environment:Incomplete environment configuration for EnvironmentCredential. These variables are set: AZURE_TENANT_ID
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.20.0 Python/3.12.7 (Windows-11-10.0.26100-SP0)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://management.azure.com/subscriptions/b98e9951-860f-464a-a9a2-f69802ca8721/resourceGroups/ai_llm/providers/Microsoft.MachineLearningServices/workspaces/agentic_rag/connections?api-version=REDACTED&category=REDACTED&includeAll=REDACTED'
Request method: 'G

[DEBUG] Adding 250 SOP documents to ChromaDB.


INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"


In [6]:
# Display the final output with full metadata for each source document.
# Initialize the QueryEngine with debug mode for detailed output
for query in queries:
    query_result = validate_query_processing(query, retriever_instance)
    print("=" * 80)
    print(f"Processing Query: {query}")
    print("=" * 80)
    print("Final Answer:", query_result.answer)
    print("Confidence:", query_result.confidence, "%")
    print("Source Documents Metadata:")
    for source in query_result.source_documents:
        print("-" * 40)
        for key, value in source.items():
            print(f"{key}: {value}")

  self._ollama = Ollama(model=model, temperature=temperature)
  return self._ollama(prompt, stop=stop)
  docs = qa_chain_instance.retriever.get_relevant_documents(subq)
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:query_engine:Retrieved document content: METHOD NR                                                                                                                 VALUE
                                                                                                         Acidic Index                                                                                                                  8018                                                                                                      180-185 mg KOH/g
                  
INFO:query_engine:Retrieved document metadata: {'Date': '08/02/2021', 'Document Name': 'Alkylbenzen Sulfonic Acid', 'Revie

Processing Query: Which range of humidity values are acceptable for the Alkylbenzen Sulfonic Acid?
Final Answer: * Acceptable humidity range for Alkylbenzen Sulfonic Acid: 0.5-2%
* Confidence: 100%
Confidence: 100 %
Source Documents Metadata:
----------------------------------------
Date: 08/02/2021
Document Name: Alkylbenzen Sulfonic Acid
Review: 9
----------------------------------------
Date: 08/02/2021
Document Name: Alkylbenzen Sulfonic Acid
Review: 9


INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:query_engine:Retrieved document content: MANUFACTURING PROCEDURE
Review: 24
Date: 24/04/2021 | CONFIDENTIAL
PRODUCTO: AN-84 (10939987)
TABLE DATA:
PLANT : | POLISAL3
REACTOR : | R003 | R002
TABLE DATA:
HAZARD CLASSIFICATION OF THE FINAL PRODUCT: | HAZARD CLASSIFICATION OF THE FINAL PRODUCT: | H302 Harmful if swallowed. H314 Causes severe skin burns and eye damage. H400 Very toxic to aquatic organisms. | H302 Harmful if swallowed. H314 Causes severe skin burns and eye damage. H400 Very toxic to aquatic organisms. | H302 Harmful if swall
INFO:query_engine:Retrieved document metadata: {'Date': '24/04/2021', 'Document Name': 'AN-84-24', 'Review': '24'}
INFO:query_engine:Retrieved document content: MANUFACTURING PROCEDURE
Review: 24
Date: 24/04/2021 | CONFIDENTIAL
PRODUCTO: AN-84 (10939987)
TABLE DATA:
PLANT : | POLISAL3
REACT

Processing Query: Describe the hazard classification of the AN-84 product.
Final Answer: * The hazard classification of the AN-84 product includes H302, H314, and H400 labels.
* H302 indicates that the substance is harmful if swallowed.
* H314 signifies that it causes severe skin burns and eye damage.
* H400 highlights its extreme toxicity towards aquatic organisms.
Confidence: 100 %
Source Documents Metadata:
----------------------------------------
Date: 24/04/2021
Document Name: AN-84-24
Review: 24
