# Agentic Self-Reflective RAG on Dell AI Factory with NVIDIA
### with Elasticsearch vector database
### Models served from K8s cluster

<img src="images/agentic-rag-pipeline.png" alt="Alternative text" />

## What is Agentic RAG?  

LLM agents extend the capabilities of traditional LLMs by blending their natural language comprehension capabilities with actionable functionalities. This is a significant advancement for AI, making it ideal for automation and intelligent decision-making across industries. Unlike traditional LLMs, which generate text based solely on their training data, LLM agents can connect to external systems such as APIs, databases, and applications to fetch live data, provide contextually relevant responses, and combine them in pipelines to enhance their utility in real-world applications.


This ability transforms LLMs from passive responders into dynamic actors capable of handling multi-step workflows and delivering actionable insights. In healthcare, for instance, LLM agents can securely synthesize information from patient records, clinical guidelines, and research databases to support timely, evidence-based decisions. These agents can assist in tasks such as patient diagnosis, treatment planning, and drug discovery, thereby enhancing the efficiency and accuracy of healthcare processes.


The power to process and act on information in real time while adhering to stringent compliance standards positions LLM agents as powerful tools for addressing complex, data-intensive challenges. The agents redefine what AI can accomplish, providing scalable, secure, and contextually relevant solutions to some of the most demanding problems in modern industries.


# NVIDIA NIMs

The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on 
NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models 
from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA 
accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single 
command on NVIDIA accelerated infrastructure.

NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, 
NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, 
giving enterprises ownership and full control of their IP and AI application.

### About this notebook

- Single LLM role play in a multi-agent set of tasks
- Two data sources are used, RAG and a web search fall back, but more can be added to the query router.  Route A and B are available.  Route C is shown as an example.
- NVIDIA NIMS are installed on a K8s cluster and accessed via API calls
- Notebook does not need to be run on a GPU enabled machine, all GPU required services are provided by the K8s cluster.
- Features code that can assist with clickable source files
- Features a method to turn OFF the Agentic processes to show the different in results.

### Code credit and inspiration:
- https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_self_rag/#llms
- https://github.com/NVIDIA/workbench-example-agentic-rag
- David O'Dell
- Tiffany Fahmy

# Library installs

In [1]:
# %pip install -q langchain-nvidia-ai-endpoints==0.2.2
# %pip install -q langchain==0.2.16
# %pip install -q langchain-community==0.2.17                   
# %pip install -q langchain-core==0.2.40
# %pip install -q langchain-text-splitters==0.2.4
# %pip install -q langchain-openai==0.1.23
# %pip install -q pdfminer-six==20231228
# %pip install -q pillow-heif==0.18.0
# %pip install -q opencv-python==4.10.0.84 
# %pip install -q unstructured==0.15.9
# %pip install -q unstructured-pytesseract==0.3.12
# %pip install -q pi-heif==0.18.0
# %pip install -q unstructured-inference==0.7.36
# %pip install -q tesseract==0.1.3
# %pip install -q pytesseract==0.3.10
# %pip install -q langgraph==0.2.15
# %pip install -q gradio==4.27.0
# %pip install -q elasticsearch==8.15.1
# %pip install -q tiktoken==0.8.0
# %pip install -q langchain-elasticsearch==0.2.2

## Modified
# %pip install -q gradio==5.10.0

### Set debug and verbosity

In [2]:
from langchain.globals import set_verbose, set_debug

set_debug(True)
set_verbose(True)

# Import Libraries

In [3]:
import nltk  
print(nltk.__version__)

3.9.1


In [4]:
### import loaders
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain.document_loaders import CSVLoader
# from langchain_community.document_loaders import WebBaseLoader
# from langchain_community.document_loaders import OnlinePDFLoader
from langchain_community.document_loaders.merge import MergedDataLoader

### for embedding
# from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

# from langchain_community.vectorstores import Chroma

### status bars and UI and other accessories
from tqdm import tqdm
import time

# Declare external services

Services that will be hosted outside this application, usually the LLM, the vectordb and anything else.

## Langsmith Tracing Setup

In [5]:
import os

### Consider adding these as env vars in AI Workbench to enable LangSmith tracing ###
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ["LANGCHAIN_PROJECT"] = "nvd-agentic-RAG-test"
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "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"


### Define Local LLM for initial testing

##### Model NIM, Embeddingn and Rerank will all have different ports.  In this case we used 30001, 30002, 30003. 

In [6]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, NVIDIARerank
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.llms import VLLM
from langchain_openai import OpenAI

In [7]:
# model_id = "meta/llama-3.1-8b-instruct"
model_id = "meta/llama-3.1-70b-instruct"
api_url = "http://llama31-70b-nim-nvidia-nim.apps.ai-ocp.emea.dsc.local/v1"


llm = ChatOpenAI(
    base_url=api_url,
    api_key="YOUR API KEY", #API key for what? The NIM doesn't require API key?
    model=model_id,
    temperature=0,
    max_tokens=None,
)

### Define embeddings options

In [8]:
embeddings = NVIDIAEmbeddings(
    base_url="http://www.nv-embedding.apps.ai-ocp.emea.dsc.local/v1", 
    model="nvidia/nv-embedqa-e5-v5",
    truncate="END"
)

### Define reranking options

In [9]:
reranker = NVIDIARerank(
    base_url="http://www.rerankqa-mistral-4b.apps.ai-ocp.emea.dsc.local/v1", 
    model="nvidia/nv-rerankqa-mistral-4b-v3",
    truncate="END"
)

## Define Elasticsearch vector db instance

using this as inspiration:  https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/generative-ai/chatbot.ipynb

In [10]:
from elasticsearch import Elasticsearch
from langchain_elasticsearch import ElasticsearchStore

In [11]:
### set certificate permissions to 644

In [12]:
CERTIFICATE = "/home/demo/es_http_ca.crt"
HOST = "https://www.elasticsearch.apps.ai-ocp.emea.dsc.local"
USER = "elastic"
PASSWORD = "dyak3z319h71u0Ch7y65avQs"

In [13]:
es_client = Elasticsearch(
    hosts=HOST,
    basic_auth=(USER, PASSWORD),
    verify_certs=False,
    # ca_certs=CERTIFICATE,
    # verify_certs=False,
    # ca_certs=False,
    # ca_certs=True,    
    # connection_class=RequestsHttpConnection,
)

  _transport = transport_class(


In [14]:
print(es_client.ping())
print(es_client)
print(es_client.info())

True
<Elasticsearch(['https://www.elasticsearch.apps.ai-ocp.emea.dsc.local:443'])>
{'name': 'elasticsearch-sample-es-default-0', 'cluster_name': 'elasticsearch-sample', 'cluster_uuid': 'WkoIxxUjSP-XDtgB1H_KCg', 'version': {'number': '8.16.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '12ff76a92922609df4aba61a368e7adf65589749', 'build_date': '2024-11-08T10:05:56.292914697Z', 'build_snapshot': False, 'lucene_version': '9.12.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}




# Vector db Content setup

### Load PDF data into loader object

##### We want the pdf files to be clickable so we set up a prefix appended to each file that points to the server they are residing at.

In [15]:
# Directory containing PDF files
pdf_directory = "docs"
# url_prefix = "http://IP ADDRESS OF FILE SERVER/" #Where do we get the pdfs? Would be easier to put them locally for time being and making prefix empty
url_prefix = ""

# Load PDF documents
pdf_dir_loader = PyPDFDirectoryLoader(pdf_directory)


### Load CSV data into loader object

In [16]:
patient_data_csv_loader = CSVLoader("docs/healthcare_dataset.csv", encoding='windows-1252')

### view CSV head contents

In [17]:
import pandas as pd

In [18]:
df = pd.read_csv('docs/healthcare_dataset.csv')
print(df.head(5))

  Patient Name Date of Birth  Age Reason for Admission       Procedure  \
0    Patient_3      8/6/2004   19                COVID  antiviral meds   
1   Patient_39     3/19/2004   19                COVID  oxygen therapy   
2   Patient_33     12/8/2002   21                COVID  oxygen therapy   
3   Patient_40      7/6/2001   23                  Flu  antiviral meds   
4    Patient_1    10/28/1999   24            Pneumonia     antibiotics   

       Room Date of Discharge  Length of Stay   Charges  Balance Remaining  
0  Room_237        12/27/2023               0    237.34              37.34  
1  Room_440         1/12/2024              22  29980.84            3018.84  
2  Room_298        12/31/2023               6   4028.77            2681.26  
3  Room_360        11/30/2023               0    273.69               0.00  
4  Room_239          1/5/2024               1    490.50             341.28  


In [19]:
num_rows = df.shape[0]
print(f"Total number of rows: {num_rows}")

Total number of rows: 50


## merge pdf and csv

In [20]:
# Merge the PDF and CSV loaders into a single dataset
merged_loader = MergedDataLoader(loaders=[pdf_dir_loader, patient_data_csv_loader])

# Load all the merged documents
merged_documents = merged_loader.load()

In [21]:
len(merged_documents)

280

In [22]:
### CSV file rows are broken down and made into one document per row
### 230 PDf file documents, + 50 rows of CSV file = 280 documents

PDF loader seems to treat each page as separate entry

In [23]:
merged_documents[6]

Document(metadata={'source': 'docs/skin_cancer_prevention_update.pdf', 'page': 4}, page_content='July 15, 2005 U Volume 72, Number 2 www.aafp.org/afp American Family Physician  273\ndiagnosis. 25 Some dermatology experts also add E for \nevolution or for elevation above skin level. In general, \nbenign lesions are round and symmetric, while melano-\nmas are asymmetric. Benign lesions usually have regu-\nlar margins, while melanomas have irregular borders. \nBenign lesions are uniform in color, while melanomas \nare more heterogeneous, with colors ranging from tan \nto brown and black, often with areas of red, white, or \nblue. Finally, most benign lesions are smaller than 6 mm \nin diameter, while melanomas often are larger than 6 \nmm at the time of diagnosis (Figures 3 through 10) . The \nABCD checklist is a sensitive diagnostic test (90 to 100 \npercent, depending on whether a positive test is defined \nas the presence of one, two, or three of the ABCDs), but \nthe specificity is no

## Transform source format to include URL for pdf chunks in documents 
This assumes an nginx instance running and pointing to a mounted pdf directory

In [24]:
# Prepend URL prefix to the source in metadata

pdf_count = 0
csv_count = 0

for doc in merged_documents:
    if 'source' in doc.metadata:
        # Remove the directory part from the source path
        file_name = doc.metadata['source'].replace(pdf_directory + "/docs", "")
        doc.metadata['source'] = url_prefix + file_name

        # Count the number of PDF and CSV documents
        if file_name.lower().endswith('.pdf'):
            pdf_count += 1
        elif file_name.lower().endswith('.csv'):
            csv_count += 1

# Print the total number of PDF and CSV documents
print(f"Total PDF documents: {pdf_count}")
print(f"Total CSV rows: {csv_count}")


# Print the updated sources to verify
for doc in merged_documents[:5]:  # Print first 5 for verification
    print(doc.metadata['source'])

Total PDF documents: 230
Total CSV rows: 50
docs/mental_health_first_aid.pdf
docs/mental_health_first_aid.pdf
docs/skin_cancer_prevention_update.pdf
docs/skin_cancer_prevention_update.pdf
docs/skin_cancer_prevention_update.pdf


### Chunk and split documents

Each document will be chunked and split along the chunk_size parameter.  The overlap parameter will ADD to the amount of characters, so 512 plus 256 overlap will equal a split size of around 800.  An overlap of zero will equal a split size of only the chunk value.

In [25]:
# text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=0)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=512, chunk_overlap=64)
doc_splits = text_splitter.split_documents(merged_documents)

## Prepare Elasticsearch Index

In [26]:
INDEX_NAME = "nvd_agentic_rag_updated_merged_chunked_index"

### Delete and rebuild ES index

NOTE:  If you don't delete the index and start embedding duplicates into an existing index, you will get extremely bad performance and errors due to mulitiple documents with the same content.  

In [27]:
# Check if the index exists and delete it if it does
if es_client.indices.exists(index=INDEX_NAME):
    es_client.indices.delete(index=INDEX_NAME)
    print(f"Index '{INDEX_NAME}' deleted successfully.")
else:
    print(f"Index '{INDEX_NAME}' does not exist, will be created in the next step.")



Index 'nvd_agentic_rag_updated_merged_chunked_index' deleted successfully.


### Initial embed documents into vector store

In [28]:
%%time

vectorstore = ElasticsearchStore.from_documents(
    doc_splits,
    embeddings,
    index_name=INDEX_NAME,
    es_connection=es_client,
)

print('\n' + 'Time to complete:')




Time to complete:
CPU times: user 469 ms, sys: 123 ms, total: 592 ms
Wall time: 4.68 s


### Verify document structure in Elasticsearch

In [29]:
# Function to check the structure of documents in the index
def check_document_structure(index_name, es_client, num_docs=2):
    # Search for documents in the index
    response = es_client.search(
        index=index_name,
        body={
            "query": {
                "match_all": {}
            },
            "size": num_docs
        }
    )

    # Check if documents are found
    if response['hits']['total']['value'] > 0:
        print(f"Found {response['hits']['total']['value']} documents in the index '{index_name}'.")
        for doc in response['hits']['hits']:
            print(f"Document ID: {doc['_id']}")
            print(f"Document structure: {doc['_source']}")
            print("-" * 80)
    else:
        print(f"No documents found in the index '{index_name}'.")

# Check the structure of documents
check_document_structure(INDEX_NAME, es_client)

Found 687 documents in the index 'nvd_agentic_rag_updated_merged_chunked_index'.
Document ID: da81a066-df52-462e-a120-3adc470da95b
Document structure: {'text': '133\nMei\xa0C, McGorry\xa0PD. Evid Based Ment Health 2020;23:133–134. doi:10.1136/ebmental-2020-300154\nPerspective\nMental health first aid: strengthening its impact for \naid\xa0recipients\nCristina Mei    ,1,2 Patrick D McGorry1,2\nTo cite: Mei\xa0C, McGorry\xa0PD. \nEvid Based Ment Health \n2020;23:133–134.\n1Orygen, Parkville, Victoria, \nAustralia\n2Centre for Youth Mental \nHealth, University of Melbourne, \nParkville, Victoria, Australia\nCorrespondence to\nProfessor Patrick D McGorry, \nOrygen, Parkville, VIC 3052, \nAustralia;  pat. mcgorry@ orygen. \norg. au\nReceived 6 April 2020\nRevised 4 June 2020\nAccepted 9 June 2020\nPublished Online First \n29\xa0July\xa02020\n© Author(s) (or their \nemployer(s)) 2020. No \ncommercial re- use. See rights \nand permissions. Published \nby BMJ.\nABSTRACT\nMental Health First Ai



### Create direct vectorstore retriever

In [30]:
retriever = vectorstore.as_retriever()

In [31]:
import json

def get_unique_files(merged_documents):
    file_list = []
    
    for doc in merged_documents:
        source = doc.metadata.get('source', None)
        if source:
            file_list.append(source)
    
    # Use a set to get unique file URLs
    unique_list = list(set(file_list))
    
    # Sort the unique list by file extension
    sorted_unique_list = sorted(unique_list, key=lambda x: x.split('.')[-1])
    
    print("\nList of unique files in merged loader, sorted by file type:\n")
    for unique_file in sorted_unique_list:
        print(unique_file)
    
    pretty_files = json.dumps(sorted_unique_list, indent=4, default=str)
    
    return pretty_files

# Example usage
unique_files_sorted = get_unique_files(merged_documents)



List of unique files in merged loader, sorted by file type:

docs/healthcare_dataset.csv
docs/ipilimumab-for-advanced-melanoma-a-pharmacologic-perspective.pdf
docs/covid_predictor_in_older_patients.pdf
docs/skin_cancer_screening.pdf
docs/covid_variants.pdf
docs/covid_depression_in_72_yr_old_case_study.pdf
docs/mental_health_first_aid.pdf
docs/covid_update.pdf
docs/population_based_approach_to_mental_health.pdf
docs/understanding-pharmacology-covid-mrna.pdf
docs/skin_cancer_prevention_update.pdf
docs/investigating-the-efficacy-of-osimertinib-and-crizotinib-in-phase-3-clinical-trials-on-anti-cancer.pdf
docs/skin_cancer_cells.pdf


# Setup and Test Agent pipeline elements

### Generate RAG Response

In [32]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Prompt
llm_prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant in a health care clinic. <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "document"],
)


# Chain
llm_chain = llm_prompt | llm | StrOutputParser()

# test
# question = "Tell me about mental health from a population perspective."
# generation = llm_chain.invoke({"question": question})


In [34]:
# print(generation)

### LLM-only toggle function

Send request only to LLM directly, bypass RAG vector search

In [35]:
def get_llm_response(question):
    generation = llm_chain.invoke({"question": question})
    
    return generation

### RAG chain setup

## https://medium.com/@callumjmac/implementing-rag-in-langchain-with-chroma-a-step-by-step-guide-16fc21815339

In [36]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema import Document

# Prompt
rag_prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant in a health care clinic. 
    Use the following pieces of retrieved context to answer the question first. Always ensure you directly address the user's question explicitly and focus on providing a clear and accurate answer. 
    After answering the question, provide actionable next steps as the final part of your response. Ensure the next steps are practical, relevant, and tailored to the context provided. 
    If you don't know the answer, just say that you don't know. 
    Keep your responses concise, actionable, and tailored to the context provided. 
    <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer and Next Steps: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)


# Chain
rag_chain = rag_prompt | llm | StrOutputParser()


# # Query vector db and get similarity distance scores

# question = "Tell me about mental health from a population perspective."

# search_results = vectorstore.similarity_search_with_relevance_scores(question, k=5)

# print(search_results)

# documents = [Document(page_content=str(result[0].page_content), metadata={**result[0].metadata, "score": result[1]}) for result in search_results]

# print(documents)

# # # Print the scores and rankings
# # for rank, (doc, vector_score) in enumerate(documents, start=1):
# #     print(f"-- --\n\nRank: {rank}, Score: {vector_score}, Document: {doc.page_content}")

# # # Generate the response
# # generation = rag_chain.invoke({"context": [doc.page_content for doc, _ in results], "question": question})

# for rank, doc in enumerate(documents, start=1):
#     print(f"-- --\n\nRank: {rank}, Score: {doc.metadata['score']}, Page: {doc.metadata['page']}, Source: {doc.metadata['source']}, Document: {doc.page_content}")

# # Generate the response
# generation = rag_chain.invoke({"context": [doc.page_content for doc in documents], "question": question})




### Question Router Chain setup


In [37]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser

router_prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at routing a 
    user question to a vectorstore or web search. Use the vectorstore exclusively for questions related to patient data, skin cancer, covid, or mental health. 
    You do not need to be stringent with keywords in the question related to these topics. If **any document** is found to be relevant in the vectorstore, stop immediately and generate the answer using that data. 
    **Do not perform a web search** if even one relevant document is found, regardless of the overall assessment of other documents.
    If **no relevant data** is found at all in the vectorstore, or if the question is unrelated to these topics, use web_search.
    
    Provide the answer in JSON format with a single key called 'datasource' and a single answer either 'vectorstore' or 'websearch' as the value.
    Please do not include a preamble or explanation. Your response should be formatted as follows: \'{{"datasource": "value"}}\'.

    Example 1: A question that is not related to patient data, skin cancer, covid, or mental health should return with a response to use the web_search like this: \'{{"datasource": "websearch"}}\'.

    Example 2: A question that is related to patient data, skin cancer, covid, or mental health and any relevant data is found in the vectorstore should return a response like this: \'{{"datasource": "vectorstore"}}\'.

    Question to route: {question}
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question"],
)

question_router = router_prompt | llm | JsonOutputParser()

# question = "Tell me about mental health from a population perspective."

# docs = retriever.invoke(question)

# doc_txt = docs[1].page_content


In [38]:
print(question_router.invoke({"question": question}))

NameError: name 'question' is not defined

### Relevance / Retrieval Grader

Checks index of vectorstore to see if there are relavent docs

In [39]:
from langchain.prompts import PromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import JsonOutputParser

retrieval_prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing relevance 
    of a retrieved document to a user question. If the document contains keywords related to the user question, 
    grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
    Provide the binary score as a JSON with a single key 'relevance_yes_no_score' and no premable or explanation.
     <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here is the retrieved document: \n\n {document} \n\n
    Here is the user question: {question} \n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """,
    input_variables=["question", "document"],
)

retrieval_grader = retrieval_prompt | llm | JsonOutputParser()

# question = "Tell me about mental health from a population perspective."

# docs = retriever.invoke(question)

# doc_txt = docs[1].page_content


In [40]:
# print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

### Hallucination Grader

Checks to see if the generation is grounded in truth using the source documents as a reference.  
If the generation is grounded in truth, then the hallucination grader responds positively with Yes.

If the generation is NOT grounded in truth and has no relavence with the source documents, the grader responds negatively with No.

In [41]:
hallucination_grader_prompt = PromptTemplate(
    template=""" <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether 
    an answer is grounded in / supported by a set of facts. Give a binary 'yes' or 'no' score to indicate 
    whether the answer is grounded in / supported by a set of facts. Provide the binary score in JSON format with a 
    single key 'score' and no preamble or explanation, like this \'{{\'"score": "yes"\'{{\' or \'{{\'"score": "no"\'{{\'. <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here are the facts:
    \n ------- \n
    {documents} 
    \n ------- \n
    Here is the answer: {generation}  <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "documents"],
)

hallucination_grader = hallucination_grader_prompt | llm | JsonOutputParser()


In [42]:
# hallucination_grader.invoke({"documents": docs, "generation": generation})

### Answer Grader

Is the answer provided "useful" to the question.

In [43]:
answer_grader_prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether an 
    answer is useful to resolve a question. Give a binary score 'yes' or 'no' to indicate whether the answer is 
    useful to resolve a question. Provide the binary score in JSON format with a 
    single key 'score' and no preamble or explanation, like this \'{{\'"score": "yes"\'{{\' or \'{{\'"score": "no"\'{{\'. 
     <|eot_id|><|start_header_id|>user<|end_header_id|> Here is the answer:
    \n ------- \n
    {generation} 
    \n ------- \n
    Here is the question: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "question"],
)

answer_grader = answer_grader_prompt | llm | JsonOutputParser()


In [44]:
# answer_grader.invoke({"question": question, "generation": generation})

### Web Search

uses the python library for Tavily open search.  Create an account and API here:
https://blog.tavily.com/getting-started-with-the-tavily-search-api/

In [45]:
os.environ["TAVILY_API_KEY"] = "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"

In [46]:
from langchain_community.tools import TavilySearchResults

web_search_tool = TavilySearchResults(max_results=5)

### Graph relations function setup

In [47]:
from typing_extensions import TypedDict
from typing import List
from langchain.schema import Document

################################ State ##############################

class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        web_search: whether to add internet search
        documents: list of documents
    """

    question: str
    generation: str
    web_search: str
    documents: List[str]


### Question router function setup

In [48]:
def route_question(state):
    """
    Route question to web search or RAG.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    question = state["question"]
    print(question)

    
    # Status message
    global router_status, router_choice, routing_agent_panel_content
    
    target_source = question_router.invoke({"question": question})
    
    print(target_source)

    # Initialize colors for routing panel in UI
    web_color = "white"
    rag_color = "white"

    # Check if source is a dictionary
    if isinstance(target_source, dict):
        if "datasource" in target_source:
            print(target_source["datasource"])
            router_choice = target_source["datasource"]
            
            if target_source["datasource"] == "websearch":
                print("---DECISION: ROUTE QUESTION TO WEB SEARCH---")
                router_status = "success"
                web_color = "#4CBB17"
                rag_color = "white"
                
            elif target_source["datasource"] == "vectorstore":
                print("---DECISION: ROUTE QUESTION TO RAG---")
                router_status = "success"
                web_color = "white"
                rag_color = "#4CBB17"
                
        else:
            print("Error: 'datasource' key not found in source")
    else:
        print("Error: source is not a dictionary")

    
    # HTML table generation with green-colored cell for the routed choice
    # print(rag_color)
    # print(web_color)

    routing_agent_panel_content = f"""
    <table style="width: 100%;">
        <tbody>
            <tr>
                <td style="padding: 1; width: 50%; background-color: {web_color}; text-align: center;"><b>Web Search</b></td>
                <td style="padding: 1; width: 50%; background-color: {rag_color}; text-align: center;"><b>RAG database</b></td>
            </tr>
        </tbody>
    </table>
    """

    ### finish and return the results
    
    if router_choice == "websearch":
        return "websearch"
    elif router_choice == "vectorstore":
        return "vectorstore"
    else:
        return None

### Retrieval function setup

In [49]:
def retrieve(state):
    """
    Retrieve documents from vectorstore

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE USING NVIDIA EMBEDDINGS NIM---")
    question = state["question"]


    ### set nearest neighbors here, it will be used in other function as well
    
    global k_nearest
    
    k_nearest = 5
    
    # Retrieval now with vector similarity score
    search_results = vectorstore.similarity_search_with_score(question, k_nearest)


    documents = [Document(page_content=str(result[0].page_content), metadata={**result[0].metadata, "score": result[1]}) for result in search_results]
    
    # print(documents)
    
    # Status message
    global retrieve_status
    
    retrieve_status = "success"
    
    return {"documents": documents, "question": question}

### Rerank function setup

In [50]:
def rerank(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---NVIDIA RERANK NIM PROCESS---")
    question = state["question"]
    documents = state["documents"]

#### reranking NIM will create and reformat the metatdata with a new relevance_score key value, the docs will reflect this new metadata
    
    # Reranking
    documents = reranker.compress_documents(query=question, documents=documents)

    # print(documents)

    # Status message
    global rerank_status
    rerank_status = "success"
    
    return {"documents": documents, "question": question}

### Grade document relevance function setup
This will grade and create a list of relevant and not relevant docs as well as the count of those docs. This will print to raw output but will also be available for use in other functions.

In [51]:
def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.
    If no document is relevant, we will set a flag to run web search.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Filtered out irrelevant documents and updated web_search state
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    
    global filtered_docs  # Declare the global variable to place docs in to be accessed in other functions
    global not_relevant_docs  # Declare the global variable to place not relevant docs
    global not_relevant_count  # Declare the global variable to place not relevant docs
    global relevant_count  # Add this line


    # Score each doc
    filtered_docs = []
    not_relevant_docs = []

    web_search = "Yes"  # Default to Yes in case there is no relevant doc

    ## take the page content value of documents retrieved and grade it against the question,
    ## this means the filtered_docs array will only contain page content values, not file names

    # Counter for not relevant documents
    not_relevant_count = 0
    relevant_count = 0  # Initialize relevant_count

    
### CATEGORIZE RELEVANT VS NOT RELEVANT DOCS


    
    for doc in documents:
        relevance_yes_no_score = retrieval_grader.invoke(
            {"question": question, "document": doc.page_content}
        )
        grade = relevance_yes_no_score["relevance_yes_no_score"]
        
        print(grade)

        # Extract the source from metadata
        source = doc.metadata.get('source', 'Unknown Source')
        
        # Extract a snippet of the page content
        snippet = doc.page_content[:200]  # Adjust the number of characters as needed

        # # Extract the similarity score - a score of the distance between 2 vectors, lower number is best
        # similarity_score = doc.metadata.get('score', 'Unknown Score')

        # # Extract the reranker relevance score - the higher the score is best
        # rerank_relevance_score = doc.metadata.get('relevance_score', 'Unknown Score')

        
        # Document relevant
        if grade.lower() == "yes":   ### set to lower case 
            # print(f"---GRADE: DOCUMENT RELEVANT---\nSource: {source}\nSnippet: {snippet}\nVector Distance Score: {similarity_score}\nRerank Relevance Score: {rerank_relevance_score}")
            filtered_docs.append(doc)
            # Since we found at least one relevant document, set web_search to "No"
            web_search = "No"
            relevant_count += 1  # Increment the relevant counter
            
            
        # Document not relevant
        else:
            # print(f"---GRADE: DOCUMENT NOT RELEVANT---\nSource: {source}\nSnippet: {snippet}\nVector Distance Score: {similarity_score}\nRerank Relevance Score: {rerank_relevance_score}")
            not_relevant_docs.append(doc)
            not_relevant_count += 1  # Increment the counter
            # Do not include the document in filtered_docs
            # will default to web search = yes
            continue

 
    # Status message
    global relevance_status
    relevance_status = "success"

    global relevance_report_msg
    
    # Check if relevant documents are less than half of the total documents
    if relevant_count < len(documents) / 2:
        next_steps = "Try rephrasing your question, or adding more documents related to the question."
    else:
        next_steps = ""
    
    relevance_report_msg = f"""
        <p style="font-size: large;">
            Found <span style="color: black;">{relevant_count}</span> out of <span style="color: black;">{len(documents)}</span> documents to be most relevant to the question.
        </p>
        <p style="font-size: large">
            {next_steps}
        </p>
    """

    return {"documents": filtered_docs, "question": question, "web_search": web_search, "relevant_count": relevant_count}


### Answer Reliability meter function setup
The logic here is that more graded relevant document snippets will typically lead to a more reliable answer 

In [52]:
def answer_reliability_meter(relevant_count):
    global k_nearest

    # Increase the number of blocks by a factor of 4 to size the meter and make it look more visible
    relevant_count *= 4
    # relevant_count = min(relevant_count * 4, 20)  # Limit to 25 blocks max
    local_k_nearest = k_nearest * 3

    # Calculate relevance percentage based on the scaled values
    relevance_percentage = min((relevant_count / local_k_nearest) * 100, 100)
    
    if relevant_count > local_k_nearest / 2:
        # High relevance: more than half of local_k_nearest are relevant
        filled_square = "<span style='color: green;'>&#9632;</span>" * relevant_count  # Filled square: ■
        empty_square = "<span style='color: lightgray;'>&#9633;</span>" * (local_k_nearest - relevant_count)  # Empty square: □
    else:
        # Low relevance: less than or equal to half of local_k_nearest are relevant
        filled_square = "<span style='color: yellow;'>&#9632;</span>" * relevant_count  # Yellow square: ■
        empty_square = "<span style='color: lightgray;'>&#9633;</span>" * (local_k_nearest - relevant_count)  # Empty square: □

    # return f"""
    # <div style='font-size: 24px; display: flex; align-items: center; justify-content: space-between; 
    #             width: 100%; max-width: 500px; white-space: nowrap; overflow: hidden;'>
    #     <div style='display: flex; gap: 2px; max-width: 420px; overflow: hidden; flex-shrink: 0;'>{filled_square}{empty_square}</div>
    #     <span style='font-size: 18px; margin-left: 8px;'>{relevance_percentage:.0f}%</span>
    # </div>"""


    # # Return the meter with a percentage display without fixing the width
    # return f"""
    # <div style='font-size: 24px; display: flex; flex-wrap: wrap; align-items: center; width: 100%; max-width: 400px;'>
    #     {filled_square + empty_square}
    #     <span style='font-size: 18px; margin-left: 10px;'>{relevance_percentage:.0f}%</span>
    # </div>"""

    return f"<div style='font-size: 24px; display: flex; align-items: center; justify-content: space-between; width: 100%;'>" \
           f"{filled_square + empty_square}" \
           f"<span style='font-size: 18px;'>{relevance_percentage:.0f}%</span>" \
           f"</div>"


### Decide to Generate or web fallback function setup

In [53]:
def decide_to_generate(state):
    """
    Determines whether to generate an answer, or add web search.

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    question = state["question"]
    web_search = state["web_search"]
    filtered_documents = state["documents"]

    global web_fallback_status
    
    if web_search == "Yes":
        # No relevant documents were found, so fall back to web search
        print(
            "---DECISION: RAG DOCS DO NOT CONTAIN RELEVANT CONTENT, FALLING BACK TO WEBSEARCH---"
        )
        web_fallback_status = "success"
        return "websearch"
    else:
        # We have relevant documents, so generate the answer
        print("---DECISION: GENERATE ANSWER---")
        return "generate"



In [54]:
def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """

    print("---WEB SEARCH---")
    question = state["question"]
    
    # Check if 'documents' key exists in state, if not, initialize it
    if "documents" not in state:
        state["documents"] = []

    ## passes existing documents list to function, there may or may not be doc in the array
    
    documents = state["documents"]

    # Web search
    docs = web_search_tool.invoke({"query": question})

    # Transform the keys from 'url' to 'source' and 'content' to 'page_content'
    transformed_docs = [{"source": d["url"], "page_content": d["content"]} for d in docs]

    # Create Document objects with the transformed results
    for doc in transformed_docs:
        document = Document(metadata={'source': doc['source']}, page_content=doc['page_content'])
        documents.append(document)


    # # Join the transformed documents into a single string
    # web_results = "\n".join([f"source: {d['source']}\npage_content: {d['page_content']}" for d in transformed_docs])

    # web_results = "\n".join([d["content"] for d in docs])
    # web_results = Document(page_content=web_results)

    ## adds the web results to the existing documents list
    
    # documents.append(web_results)

    # Status message
    global websearch_status
    websearch_status = "success"
    
    return {"documents": documents, "question": question}



### Web search function setup

In [55]:
def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """

    print("---WEB SEARCH---")
    question = state["question"]
    
    # Check if 'documents' key exists in state, if not, initialize it
    if "documents" not in state:
        state["documents"] = []

    ## passes existing documents list to function, there may or may not be doc in the array
    
    documents = state["documents"]

    # Web search
    docs = web_search_tool.invoke({"query": question})

    
    # Transform the keys from 'url' to 'source' and 'content' to 'page_content'
    transformed_docs = [{"source": d["url"], "page_content": d["content"]} for d in docs]

    # Create Document objects with the transformed results
    for doc in transformed_docs:
        document = Document(page_content=doc['page_content'], metadata={'source': doc['source']})
        documents.append(document)


    
    global relevance_report_msg
    
    relevance_report_msg = f"""
            <p style="font-size: large;">
                RAG database lacked relevant documents, Agent has diverted question to Web Search.
            </p>
        """
    
    global relevant_count, k_nearest, websearch_status
    
    relevant_count = len(documents)
    k_nearest = len(documents)
    websearch_status = "success"
    
    return {"documents": documents, "question": question}


### Generate answer function setup

In [56]:
def generate(state):
    """
    Generate answer using RAG on retrieved documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE AN ANSWER---")
    question = state["question"]
    documents = state["documents"]


    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    
    return {"documents": documents, "question": question, "generation": generation}


### Calculate NVIDIA green rank sources function setup
this is for the GUI, the rerank source docs list have a descending shade of green

In [57]:
def calculate_nvidia_green_intensity(rank, max_rank=1):
    """
    Adjust the green intensity based on rank.
    - If there's only 1 document, default to medium green.
    - If multiple documents exist, Rank 1 is lightest, and higher ranks get darker.
    """
    nvidia_green_base = (118, 185, 0)  # Base NVIDIA green (#76B900)

    # Handle the case where there's only one document (avoid division by zero)
    if max_rank == 1:
        green_intensity_factor = 0.75  # Default to medium green if only 1 document
    else:
        normalized_rank = (rank - 1) / (max_rank - 1)  # Rank 1 -> 0 (lightest), max_rank -> 1 (darkest)
        green_intensity_factor = 0.5 + (normalized_rank * 0.5)  # Gradual shift to darker green

    # Apply the adjusted green intensity
    green_intensity = tuple(int(value * green_intensity_factor) for value in nvidia_green_base)

    return f'rgb({green_intensity[0]}, {green_intensity[1]}, {green_intensity[2]})'


### Usefulness Grade answer vs documents vs question function setup

In [58]:
################################ Conditional Edge ##############################

def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---HALLUCINATION CHECKER---")
    
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    ## SOURCE DOCUMENTS HANDLER ##
    ## create a new array with source file and page content snippet for use in GUI output
    
    global filtered_docs_formatted
    filtered_docs_formatted = []

    for rank, doc in enumerate(documents, start=1):  # Ensure documents is defined
        source = doc.metadata['source']
        page_content_snippet = doc.page_content[:200]  # Get the first 200 characters of the snippet
        color = calculate_nvidia_green_intensity(rank, max_rank=len(documents))  # Dynamic rank-based green

        # Append the formatted HTML to the list
        filtered_docs_formatted.append(f'''
        <table style="width: 100%; border-collapse: collapse; background-color: #fff; margin-bottom: 5px;">
            <tr>
                <td style="width: 5%; background-color: {color}; text-align: center; font-weight: bold; color: white; padding: 5px;">
                    {rank}
                </td>
                <td style="padding: 5px;">
                    <div style="font-weight: bold;">Source:</div>
                    <a href="{source}" target="_blank" class="custom-link">{source}</a>
                    <br>
                    <div style="margin-top: 5px;"><b>Snippet</b>: {page_content_snippet}</div>
                </td>
            </tr>
        </table>
        ''')


    
    ### HALLUCINATION CHECK
    ### first starts with hallucination grader which compares the answer to the documents
    ### a grade of YES means grounded in documents.  
    ### a grade of NO would indicate not grounded in docs and would qualify as an hallucination.

    ### USEFULNESS CHECK
    ### then a usefulness check that compares the answer to the question to ensure it actually answers the question.
    ### a grade of YES means the answer addresses the question 
    ### a grade of NO would indicate the answer fails to address the question.


    # Status message
    global hallucination_status, usefulness_status, formatted_usefulness_table
    
    # HTML table template
    usefulness_table_template = """
    <table border="1">
      <tr>
        <th>Status</th>
        <th>Task</th>
      </tr>
      <tr>
        <td style="color: {color1};">{status1}</td>
        <td>Answer is grounded in relevant documents</td>
      </tr>
      <tr>
        <td style="color: {color2};">{status2}</td>
        <td>Answer effectively addresses the question</td>
      </tr>
    </table>
    """
    
    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score["score"]
    
    # Check whether or not answer is grounded in documents and no hallucinations, yes is pass, no is fail.
    if grade == "yes":
        print("---DECISION: ANSWER IS GROUNDED IN DOCUMENTS - NO HALLUCINATIONS---")
    
        hallucination_status = "success"
    
        # After hallucination check has passed, now check whether the answer addresses the question
        print("---GRADE ANSWER vs THE QUESTION---")
        
        score = answer_grader.invoke({"question": question, "generation": generation})
        
        grade = score["score"]
    
        # Evaluate the question-answering score
        if grade == "yes":
            print("---DECISION: ANSWER ADDRESSES QUESTION AND IS USEFUL---")
            usefulness_status = "success"
            
            # Update the HTML table
            formatted_usefulness_table = usefulness_table_template.format(color1="green", status1="&#10004;", color2="green", status2="&#10004;")
            
            return "useful"
            
        else:
            print("---DECISION: ANSWER DOES NOT ADDRESS QUESTION---")
            
            # Update the HTML table
            formatted_usefulness_table = usefulness_table_template.format(color1="green", status1="&#10004;", color2="grey", status2="&#10008;")
            
            return "not useful"
    
    # If it's hallucinating, and answer is not related to documents, retry
    else:
        print("---DECISION: POSSIBLE HALLUCINATIONS - ANSWER IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        
        # Update the HTML table
        formatted_usefulness_table = usefulness_table_template.format(color1="grey", status1="&#10008;", color2="grey", status2="&#10008;")
        
        return "not supported"

### Langgraph node definitions

In [59]:
from langgraph.graph import END, StateGraph

workflow = StateGraph(GraphState)

# Define the nodes

workflow.add_node("websearch", web_search)  # web search
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("rerank", rerank)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generate

### Langgraph build node relations

In [60]:
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "websearch",
        "vectorstore": "retrieve",
    },
)
workflow.add_edge("retrieve", "rerank")
workflow.add_edge("rerank", "grade_documents")

# workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "websearch": "websearch",
        "generate": "generate",
    },
)
workflow.add_edge("websearch", "generate")

workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "websearch",
    },
)

# Agentic Response Function for GUI


### Status panel formatting

In [61]:
import io
import sys
import time
from contextlib import redirect_stdout

### used to format and color the status alert

def status_update(status):
    if status == "success":
        return '<div style="background-color: green; color: white; padding: 5px; border-radius: 5px;">Completed successfully</div>'
    elif status in ["websearch", "vectorstore"]:
        return f'<div style="background-color: blue; color: white; padding: 5px; border-radius: 5px;">{status}</div>'
    else:
        return '<div style="background-color: grey; color: white; padding: 5px; border-radius: 5px;">Not used</div>'


### MAIN RESPONSE FUNCTION
This takes the inputs from the GUI and triggers all the other external functions.

In [62]:
def get_agentic_response(question, mode_toggle):

   
    ### Create and set status globals
    ### reset value of previous variable values upon new execution of main function
    
    global router_status, router_choice, retrieve_status, rerank_status, relevance_status, web_fallback_status, websearch_status, hallucination_status, usefulness_status, filtered_docs_formatted, answer_reliability_content, routing_agent_panel_content, relevance_report_msg, formatted_usefulness_table
    
    # router_status = None
    # router_choice = None
    # retrieve_status = None
    # rerank_status = None
    # relevance_status = None
    # web_fallback_status = None
    # websearch_status = None
    
    hallucination_status = None
    usefulness_status = None
    routing_agent_panel_content = None
    relevance_report_msg = None
    answer_reliability_content = None
    

    # print(routing_agent_panel_content)

    
    # Compile the workflow
    app = workflow.compile()

    # Prepare the input
    inputs = {"question": question}

    # Initialize the response variable
    response = None


    # Stream the output from the app

    for output in app.stream(inputs):
        for key, value in output.items():
            # Check if 'generation' key is in the value
            if 'generation' in value:
                response = value['generation']

    
    graded_response = f"Response:\n{response}"

    ### bring in filtered docs formatted from outside function, join the contents to make it look better in textbox in GUI
    
    filtered_docs_content = "\n\n".join(filtered_docs_formatted)

    
    
    # bring in Status messages for indicator panel

    hallucination_status_result = status_update(hallucination_status)
    usefulness_status_result = status_update(usefulness_status)


    # if using LLM-toggle turn off the agent status messages since they aren't used
    if mode_toggle == "LLM only mode":
        llm_response = get_llm_response(question)
        # router_status_result = "Not used"
        # router_choice_result = "Not used"
        # retrieve_status_result = "Not used"
        # rerank_status_result = "Not used"
        # relevance_status = "Not used"
        # web_fallback_status = "Not used"
        # websearch_status = "Not used"
        # hallucination_status = "Not used"
        # usefulness_status = "Not used"
        # relevance_status_result = "Not used"
        # web_fallback_status_result = "Not used"
        # websearch_status_result = "Not used"
        
        hallucination_status_result = "Not used"
        usefulness_status_result = "Not used"
        relevance_report_msg = "LLM only, no relevance check"
        llm_only_source_result = "LLM only, no source docs"
        routing_agent_panel_content = "LLM only, no routing"
        answer_reliability_content = "LLM only, cannot verify answer"
        formatted_usefulness_table = "LLM only, cannot verify answer"
        
        return ( 
            llm_response, 
            routing_agent_panel_content, 
            answer_reliability_content, 
            relevance_report_msg, 
            hallucination_status_result, 
            usefulness_status_result, 
            formatted_usefulness_table, 
            llm_only_source_result
        )

                     

    
    answer_reliability_content = answer_reliability_meter(relevant_count)
        
###  the return statement of the function needs to return in the order that matches the outputs in gradio]
### in this case we are going from top left to left bottom.  To top right, to right bottom.

    return (
        response, 
        routing_agent_panel_content, 
        answer_reliability_content, 
        relevance_report_msg, 
        hallucination_status_result, 
        usefulness_status_result, 
        formatted_usefulness_table, 
        filtered_docs_content
    )


### Example question array for GUI Example questions

In [63]:
# Edit data below for specific demos ----
EXAMPLE_TITLES = [
                     "### Vector Search",
                     "### Web Fallback",
                     "### Web Search",
                 ]
EXAMPLES = [

###Vector Search

               [
                   
        "Create an email to the head nurse that summarizes the patients admitted for heart related issues.",
        "What can you say about sunscreen effectiveness in preventing melanoma?",
        "Please summarize the clinical trial info we have on our drug ipilimumab for melanoma.",
        "What are the key domains of population-based approaches to mental health?",

               ],

### Web Fallback

               [
        "What are the FDA-approved treatments for skin cancer in 2024?",

               ],

### Web Search

		       [ 
        "What year did the Bears football team win the super bowl?",
        "What is the chemical makeup of water?"
               ],

        ]

# GUI setup

In [None]:
import gradio as gr
import base64

# Placeholder URLs for the logos
DELL_LOGO_URL = "https://upload.wikimedia.org/wikipedia/commons/4/48/Dell_Logo.svg"
NVIDIA_LOGO_URL = "https://upload.wikimedia.org/wikipedia/commons/a/a4/NVIDIA_logo.svg"

# Load the NVIDIA logo using base64 encoding
def get_image_base64(image_path):
    try:
        with open(image_path, "rb") as img_file:
            return base64.b64encode(img_file.read()).decode('utf-8')
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        return None

# Path to your local NVIDIA logo
nvidia_logo_path = os.path.join("images", "nvidia-logo.png")
nvidia_base64 = get_image_base64(nvidia_logo_path)
dell_logo_path = os.path.join("images", "dell-logo.png")
dell_base64 = get_image_base64(dell_logo_path)

def clear_fields():
    return "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""

def check_question(question):
    if not question.strip():
        gr.Warning("No question entered, please input a question.")

with gr.Blocks(theme=gr.themes.Default(), title="Health Clinic Assistant") as demo:

    # Custom CSS for styling
    style = '''
    <style>
        /* Common styles for both modes */
        .custom-html {
            border-radius: 5px;
            padding: 8px;
            height: 180px;
            overflow-y: auto;
        }
        .logo-container {
            display: flex;
            align-items: center;
        }
        .logo-container img {
            height: 40px;
            width: auto;
            margin-right: 15px;
        }
        .custom-link {
            text-decoration: none;
            font-weight: bold;
        }
        .spaced-column {
            padding-right: 10px; /* Adds space between columns */
        }
        .status-panel {
            border-radius: 5px;
            padding: 8px;
            height: 50px;
            display: flex;
            align-items: center;
            justify-content: center;
            width: 100%;
            text-align: center;
        }
        
        /* Light mode - using Gradio's light theme class */
        .light-theme .custom-html {
            border: 1px solid #ccc;
            background: #fff;
        }
        .light-theme body, .light-theme .gradio-container {
            background-color: #f4f4f4;
            color: #333;
            font-family: Arial, sans-serif;
        }
        .light-theme .custom-link {
            color: #76B900;
        }
        .light-theme .status-panel {
            border: 2px solid #ccc;
            background-color: white;
            color: #333;
        }
        
        /* Dark mode - using Gradio's dark theme class */
        .dark-theme .custom-html {
            border: 1px solid #444;
            background: #2d2d2d;
        }
        .dark-theme body, .dark-theme .gradio-container {
            background-color: #1a1a1a;
            color: white;
            font-family: Arial, sans-serif;
        }
        .dark-theme .custom-link {
            color: #9eff00;
        }
        .dark-theme .status-panel {
            border: 2px solid #444;
            background-color: #2d2d2d;
            color: white;
        }
        /* These important flags ensure text inputs are readable in dark mode */
        .dark-theme input, 
        .dark-theme textarea, 
        .dark-theme select, 
        .dark-theme .gradio-textbox textarea, 
        .dark-theme .gradio-textbox input {
            color: white !important;
            background-color: #2d2d2d !important;
        }
        /* Make sure buttons have proper contrast */
        .dark-theme button, 
        .dark-theme .gradio-button {
            background-color: #2d2d2d !important;
            color: white !important;
            border-color: #444 !important;
        }
    </style>
    '''
    gr.HTML(style)

    # TITLE and LOGOS
    with gr.Row():
        gr.HTML(f"""
        <div class="logo-container">
            <img src="data:image/png;base64,{dell_base64}" alt="Dell Logo">            
            <img src="data:image/png;base64,{nvidia_base64}" alt="NVIDIA Logo">
        </div>
        <h2>Gen AI Health Assistant - Dell Technologies & NVIDIA</h2>
        <p>Dataset contains journals on COVID, Skin Cancer and Mental Health as well as simulated clinic patient data</p>
        """)
    
    # MAIN ROW AFTER TITLE
    with gr.Row():
        with gr.Column(scale=1):

            # QUESTION DROPDOWN
            question_dropdown = gr.Dropdown(choices=[question for section in EXAMPLES for question in section],
                                            label="Select a Question", 
                                            interactive=True)

    
            # MODE
            mode_toggle = gr.Dropdown(choices=["Agentic RAG mode", "LLM only mode"], 
                                      label="Select Mode", value="Agentic RAG mode", 
                                      interactive=True)
            
            # QUESTION
            question = gr.Textbox(label="Prompt", placeholder="Enter your question here...", lines=2, max_lines=2)


            # Auto-populate selected question into the textbox
            question_dropdown.change(
                fn=lambda q: q,
                inputs=[question_dropdown],
                outputs=[question]
            )

            # BUTTONS
            with gr.Row():  
                submit_button = gr.Button("Submit")
                clear_button = gr.Button("Clear")
                stop_btn = gr.Button("Stop Process")

            # RESPONSE
            response = gr.Textbox(label="Response", lines=16, max_lines=16)

        ################### RIGHT COLUMN
        with gr.Column(scale=1):  

            # ROW FOR STATUS PANELS (Reliability Meter, Hallucination, Usefulness) - FIXED
            with gr.Row(equal_height=True):
                with gr.Column(scale=4, min_width=270, elem_classes="spaced-column"):
                    gr.Markdown("#### Answer Reliability Meter")
                    answer_reliability_content = gr.HTML("<div class='status-panel'>Reliability Content</div>")
                
                with gr.Column(scale=2, min_width=100, elem_classes="spaced-column"):
                    gr.Markdown("#### Hallucination Check")
                    hallucination_status_result = gr.HTML("<div class='status-panel'>Hallucination Check Result</div>")
                
                with gr.Column(scale=2, min_width=100):
                    gr.Markdown("#### Usefulness Check")
                    usefulness_status_result = gr.HTML("<div class='status-panel'>Usefulness Check Result</div>")


            gr.Markdown("<b>Routing Agent Panel</b>")
            with gr.Accordion("See Details", open=False):  
                routing_agent_panel_content = gr.HTML("<div>Routing Agent Content</div>")
                relevance_report_msg = gr.HTML("<div>Relevance Report Content</div>")

            gr.Markdown("<b>Rerank Agent Sources and Scores</b>")
            with gr.Accordion("See Details", open=False):  
                with gr.Row():
                    with gr.Column(scale=2):
                        filtered_docs_content = gr.HTML("<div>Rerank Agent Sources and Scores Content</div>")

            gr.Markdown("#### Usefulness Table")
            formatted_usefulness_table = gr.HTML("<div>Usefulness Table Content</div>")

    gr.Markdown("<hr>")
    gr.Markdown("<hr>")
    
    warning_popup = gr.HTML("<div style='color: red;'>Please input question</div>", visible=False)

    start_event = submit_button.click(
        get_agentic_response, 
        inputs=[question, mode_toggle], 
        outputs=[response,
                 routing_agent_panel_content,
                 answer_reliability_content,
                 relevance_report_msg,
                 hallucination_status_result, 
                 usefulness_status_result,
                 formatted_usefulness_table,
                 filtered_docs_content,
                ]
    )
    
    submit_button.click(
        check_question, 
        inputs=[question], 
        outputs=[warning_popup]
    )
    
    clear_button.click(
        clear_fields, 
        inputs=[], 
        outputs=[response,
                 routing_agent_panel_content,
                 answer_reliability_content,
                 relevance_report_msg,
                 hallucination_status_result, 
                 usefulness_status_result,
                 formatted_usefulness_table,
                 filtered_docs_content,
                 warning_popup
                ]
    )    
    
    stop_btn.click(
        fn=None, 
        inputs=None, 
        outputs=None, 
        cancels=[start_event]
    )

demo.queue(max_size=25)

current_dir = os.getcwd()
images_path = os.path.join(current_dir, "images")

demo.launch(share=False, debug=True, server_name="0.0.0.0", server_port=7869, allowed_paths=[images_path])

* Running on local URL:  http://0.0.0.0:7869

To create a public link, set `share=True` in `launch()`.
