# 1. Description of the project

In this project, a RAG system is implemented and used in combination with LettuceDetect.

# 2. Setup

1. **Install these packages:**

In [None]:
%pip install -qq langchain langchain-unstructured langchain-chroma langchain-openai unstructured langchain-community unstructured[pdf] dotenv lettucedetect gradio

2. **Import the necessary modules**

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_unstructured import UnstructuredLoader
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma.vectorstores import Chroma
from langchain_community.vectorstores.utils import filter_complex_metadata
import os
from dotenv import load_dotenv, find_dotenv
from lettucedetect.models.inference import HallucinationDetector
import gradio as gr

3. **Deploy an Azure OpenAI LLM resource and embedding resource**

    Use the following link: https://ai.azure.com/
4. **Save the details to the .env file:**
    ```bash
    echo AZURE_OPENAI_API_KEY=\"your-api-key-here\" >> .env
    echo AZURE_OPENAI_API_VERSION=\"your-version-here\" >> .env
    echo AZURE_OPENAI_ENDPOINT=\"your-endpoint-here\" >> .env
    echo GPT_MODEL=\"your-llm-model-here\" >> .env
    echo EMBEDDINGS_MODEL_NAME=\"your-embeddings-model-here\" >> .env
    echo EMBEDDINGS_DEPLOYMENT=\"your-embeddings-deployment-here\" >> .env
    ```

# 3. ChromaDB setup

## 3.1 The text splitter

The text splitter divides documents into manageable chunks to optimize downstream processing and retrieval in RAG workflows.

In [None]:

def text_splitter(data, debug = False):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=50,
        length_function=len,
    )
    if debug:
        print(f"Splitting {len(data)} documents into chunks...")
    chunks = text_splitter.split_documents(data)
    return chunks

## 3.2 The document loader

The document loader reads and parses files from the corpus directory into structured document objects for downstream processing.

In [None]:
def load_documents(corpus_dir = "./corpus/", debug = False):
    # Load all documents from the corpus directory
    loaded_docs = []
    if debug:
        print(f"Loading documents from {corpus_dir}...")
    for file in os.listdir(corpus_dir):
        if debug:
            print(f"Loading {file}...")
        loader = UnstructuredLoader(corpus_dir + file, mode = 'single')
        loaded_docs.extend(loader.load())

    # Filter complex metadata from loaded documents
    if debug:
        print("Filtering complex metadata...")
    filtered_docs = filter_complex_metadata(loaded_docs)

    return filtered_docs

## 3.3 The embedding client

The embedding client initializes and manages Azure OpenAI embeddings for converting text into vector representations.

In [None]:
def embeddings(debug = False):
    load_dotenv(find_dotenv())
    model = os.getenv('EMBEDDINGS_MODEL_NAME')
    api_key = os.getenv('AZURE_OPENAI_API_KEY')
    api_version = os.getenv("AZURE_OPENAI_API_VERSION")
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    azure_deployment = os.getenv("EMBEDDINGS_DEPLOYMENT")

    # Validate required environment variables
    if not all([model, api_key, api_version, azure_endpoint, azure_deployment]):
        raise ValueError(
            """
            Missing environment variables.
            Please load all the required environment variables in the .env file:
            EMBEDDINGS_MODEL_NAME, AZURE_OPENAI_API_KEY, AZURE_OPENAI_API_VERSION,
            AZURE_OPENAI_ENDPOINT, EMBEDDINGS_DEPLOYMENT
            """
        )
    
    # Initialize and return an Azure OpenAI embeddings client
    if debug:
        print(f"Initializing embeddings with model: {model}, deployment: {azure_deployment}")
    embeddings = AzureOpenAIEmbeddings(
        model = model,
        api_key = api_key,
        api_version = api_version,
        azure_endpoint = azure_endpoint,
        azure_deployment = azure_deployment,
    )
    return embeddings

## 3.4 The vector database

The vector database stores document embeddings for fast similarity search and retrieval. Built with Chroma, it enables efficient access to relevant document chunks in RAG workflows.

In [None]:
def create_database(document_list, database_dir = "./chroma_db", debug = False):
    # Initialize the database from a given corpus of documents
    embedding_model = embeddings(debug = debug)

    # Create the database if it doesn't exist
    if not os.path.exists(database_dir) or not os.listdir(database_dir):
        if debug:
            print(f"Creating vector database with {len(document_list)} documents...")
        return Chroma.from_documents(documents = document_list,
                                                embedding = embedding_model,
                                                persist_directory = database_dir)

    # If the database exists, return it
    return Chroma(persist_directory = database_dir,
                  embedding_function = embedding_model)

## 3.5 The retriever

The retriever fetches relevant document chunks from the vector database using embeddings to match user queries with semantically similar content for efficient retrieval in RAG workflows. It is not based on LLMs, but purely on a similarity search algorithm.

In [None]:
def retriever(corpus_dir = "./corpus/", debug = False):
    docs = load_documents(corpus_dir, debug = debug)
    chunks = text_splitter(docs, debug = debug)
    vectordb = create_database(chunks, debug = debug)
    if debug:
        print("Creating retriever from vector database...")
    retriever = vectordb.as_retriever()
    return retriever

## 3.6 Small test

In [None]:
test_retriever = retriever(corpus_dir = "./example_inputs/", debug = True)

In [None]:
docs = test_retriever.get_relevant_documents("Which car features a turbocharged 3.0-liter inline-six engine")
docs

# 4. The Hallucination detector

In [None]:
def detect_hallucinations(context, question, answer, debug = False):
    if debug:
        print(f"Predicting hallucination for question: {question}")

    # Initialize the hallucination detector with a transformer model
    detector = HallucinationDetector(
        method="transformer",
        model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
    )

    # Predict hallucination using the detector
    result = detector.predict(context = context,
                              question = question,
                              answer = answer,
                              output_format = "spans")
    return result

# 5. The interface

In [None]:
def gradio_backend(uploaded_files, question, answer, debug = False):
    # Create a new corpus directory
    corpus_dir = "./corpus/"
    if os.path.exists(corpus_dir):
        for f in os.listdir(corpus_dir):
            os.remove(os.path.join(corpus_dir, f))
    os.makedirs(corpus_dir, exist_ok=True)

    # Save the uploaded files to corpus directory
    for file in uploaded_files:
        filename = os.path.basename(file.name)
        src = open(file.name, "rb")
        dst = open(os.path.join(corpus_dir, filename), "wb")
        dst.write(src.read())

    # Should take a list of strings as context
    if debug:
        print(f"Detecting hallucination for question: {question} with answer: {answer}")
    
    # Initialize the retriever with the corpus directory
    doc_retriever = retriever(corpus_dir = corpus_dir, debug = debug)

    # Retrieve relevant documents from the corpus
    if debug:
        print(f"Retrieving relevant documents for question: {question}")
    retrieved_docs = doc_retriever.get_relevant_documents(question)

    # Predict hallucination using the predict_hallucination function
    detected_hallucination = detect_hallucinations(
        context = retrieved_docs,
        question = question,
        answer = answer,
        debug = debug
    )

    # Check if an hallucination was detected
    if detected_hallucination:
        hallucination_was_found = "Hallucinations detected"
    else:
        hallucination_was_found = "No hallucinations found"

    # Create an output string based on the result
    hallucination_str = ""
    for hallucination in detected_hallucination:
        hallucination_str += f"""
           '{hallucination['text']}' - Confidence = {hallucination['confidence']}\n
        """

    return hallucination_was_found, hallucination_str


In [None]:
rag_application = gr.Interface(
    fn = gradio_backend,
    allow_flagging = "never",
    inputs = [
        # Drag and drop files, returns a list of file paths
        gr.File(label = "Upload PDF/txt files",
                file_count = 'multiple',
                file_types = ['.pdf', '.txt']),
        gr.Textbox(label = "Prompt",
                   placeholder = "Type your question here..."),
        gr.Textbox(label = "Answer",
                   lines = 3,
                   placeholder = "type the answer here..."),
    ],
    outputs = [
        gr.Textbox(label = "Status"),
        gr.Textbox(label = "Detected Hallucinations",
                   lines = 5)
    ],
    title = "RAG system with Hallucination Detection",
    description = """
        Upload a PDF document and ask any question.
        The chatbot will try to answer using the provided document.
    """
)

rag_application.launch()

# 6. Results

### Test

**Question:** Which car model features a 12.3-inch high-resolution digital instrument display that replaces traditional analog gauges?

**Answer (correct):** The Audi A4 2024 features a 12.3-inch high-resolution digital instrument display that replaces traditional analog gauges.

**Status:** Hallucinations detected

**Detected hallucinations:** 'The Audi A4 2024' - Confidence = 0.9719486832618713

We retrieve the relevant documents for the query in order to understand this result:

In [None]:
test_retriever = retriever(corpus_dir = "./example_inputs/", debug = True)
docs = test_retriever.get_relevant_documents("Which car model features a 12.3-inch high-resolution digital instrument display that replaces traditional analog gauges?")
docs

### Conclusion

LettuceDetect is a framework that detects hallucinations based on the abscence of data given a certain context that is assumed to be true, rather than on the actual validity of the evaluated answer.

Thus, when implemented in combination with a RAG system it should be used as a pre-processing step in order to point out possible conflicts with a given corpus of data, without assuming them to be necessarly true. In the previous example, the name of the car is correct. However, it is not shown in the relevant chunks (as these are about the requested information about the digital instrument display) and therefore is pointed out as an hallucination.

Later on, these conflicts can be evaluated by LLMs to determine their factual accuracy and provide an enhanced response.