# Explainable Retrieval in Document Search

## Overview

This code implements an Explainable Retriever, a system that not only retrieves relevant documents based on a query but also provides explanations for why each retrieved document is relevant. It combines vector-based similarity search with natural language explanations, enhancing the transparency and interpretability of the retrieval process.

## Motivation

Traditional document retrieval systems often work as black boxes, providing results without explaining why they were chosen. This lack of transparency can be problematic in scenarios where understanding the reasoning behind the results is crucial. The Explainable Retriever addresses this by offering insights into the relevance of each retrieved document.

## Key Components

1. Vector store creation from input texts
2. Base retriever using Chroma for efficient similarity search
3. Language model (LLM) for generating explanations
4. Custom ExplainableRetriever class that combines retrieval and explanation generation

## Method Details

### Document Preprocessing and Vector Store Creation

1. Input texts are converted into embeddings using OpenAI's embedding model.
2. A Chroma vector store is created from these embeddings for efficient similarity search.

### Retriever Setup

1. A base retriever is created from the vector store, configured to return the top 5 most similar documents.

### Explanation Generation

1. An LLM (AzureChatOpenAI) is used to generate explanations.
2. A custom prompt template is defined to guide the LLM in explaining the relevance of retrieved documents.

### ExplainableRetriever Class

1. Combines the base retriever and explanation generation into a single interface.
2. The `retrieve_and_explain` method:
   - Retrieves relevant documents using the base retriever.
   - For each retrieved document, generates an explanation of its relevance to the query.
   - Returns a list of dictionaries containing both the document content and its explanation.

## Benefits of this Approach

1. Transparency: Users can understand why specific documents were retrieved.
2. Trust: Explanations build user confidence in the system's results.
3. Learning: Users can gain insights into the relationships between queries and documents.
4. Debugging: Easier to identify and correct issues in the retrieval process.
5. Customization: The explanation prompt can be tailored for different use cases or domains.

## Conclusion

The Explainable Retriever represents a significant step towards more interpretable and trustworthy information retrieval systems. By providing natural language explanations alongside retrieved documents, it bridges the gap between powerful vector-based search techniques and human understanding. This approach has potential applications in various fields where the reasoning behind information retrieval is as important as the retrieved information itself, such as legal research, medical information systems, and educational tools.

In [3]:
from langchain_openai import AzureChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_chroma import Chroma
from langchain_openai import AzureOpenAIEmbeddings
from os import environ

class ExplainableRetriever:
    def __init__(self, texts):
        self.embeddings = AzureOpenAIEmbeddings(model="text-embedding-3-large")  # Adjust as needed
        self.vectorstore = Chroma.from_texts(texts, self.embeddings)

        # Create a base retriever
        self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 5})

        # Create an explanation chain using Azure OpenAI
        explain_prompt = PromptTemplate(
            input_variables=["query", "context"],
            template="""Analyze the relationship between the following query and the retrieved context.
                        Explain why this context is relevant to the query and how it might help answer the query.
                        
                        Query: {query}
                        
                        Context: {context}
                        
                        Explanation:"""
        )

        # Use AzureChatOpenAI from LangChain
        self.llm = llm = AzureChatOpenAI(
            temperature=0.5,
            max_tokens=None,
            timeout=None,
            max_retries=2,
            api_key=environ['AZURE_OPENAI_API_KEY'],
            api_version="2023-03-15-preview",
            azure_endpoint=environ['AZURE_OPENAI_ENDPOINT'],
            azure_deployment=environ['AZURE_OPENAI_MODEL_DEPLOYMENT'],
        )


        self.explain_chain = explain_prompt | self.llm

    def retrieve_and_explain(self, query):
        # Retrieve relevant documents
        docs = self.retriever.get_relevant_documents(query)

        explained_results = []

        for doc in docs:
            # Generate explanation
            input_data = {"query": query, "context": doc.page_content}
            explanation = self.explain_chain.invoke(input_data).content

            explained_results.append({
                "content": doc.page_content,
                "explanation": explanation
            })

        return explained_results


In [4]:
# Usage example
texts = [
    "The sky is blue because of the way sunlight interacts with the atmosphere.",
    "Photosynthesis is the process by which plants use sunlight to produce energy.",
    "Global warming is caused by the increase of greenhouse gases in Earth's atmosphere."
]
 
explainable_retriever = ExplainableRetriever(texts)

In [5]:
# Show results
query = "Why is the sky blue?"
results = explainable_retriever.retrieve_and_explain(query)

for i, result in enumerate(results, 1):
    print(f"Result {i}:")
    print(f"Content: {result['content']}")
    print(f"Explanation: {result['explanation']}")
    print()


Result 1:
Content: The sky is blue because of the way sunlight interacts with the atmosphere.
Explanation: The relationship between the query "Why is the sky blue?" and the retrieved context "The sky is blue because of the way sunlight interacts with the atmosphere." is direct and explanatory. The context is highly relevant to the query as it addresses the fundamental reason behind the color of the sky, which is the interaction of sunlight with the Earth's atmosphere.

Here's how the context helps answer the query:

1. **Direct Relevance**: The context immediately responds to the query by stating the cause of the sky's blue color.
   
2. **Scientific Basis**: It hints at a scientific explanation involving sunlight and the atmosphere, which is essential for a deeper understanding. The interaction mentioned likely refers to the scattering of light, specifically Rayleigh scattering, which causes shorter (blue) wavelengths of light to scatter more than longer (red) wavelengths.

3. **Found