<a href="https://colab.research.google.com/drive/1uzH-ybgKrDjNPEvrfwpE47IpD8jKRfnM?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

###	🔢 What is Multi-index?

Multi-index is a RAG technique that involves using multiple separate indexes or databases to store and retrieve information. Instead of having a single, monolithic index, data is distributed across several specialized indexes. Each index can be optimized for different types of data or query patterns, allowing for more efficient and targeted.

### 🔧 Multi-index RAG Implementation:

1. **Retrieval from Multiple Indices:** We retrieve relevant documents from each vector store.
2. **Context Combination:** We combine the retrieved documents from all sources into a single context.
3. **Response Generation:** Using the combined context, we generate a final response to the original query.
4. **Source Usage Explanation:** We generate an explanation of how information from different sources was used to answer the question.

### ⚙️ Setup

1. **[LLM](https://groq.com/):** Groq's free Open source LLM endpoints([Groq API Key](https://console.groq.com/keys))
2. **[Vector Store](https://www.pinecone.io/learn/vector-database/):** [ChromaDB](https://www.trychroma.com/)
3. **[Embedding Model](https://qdrant.tech/articles/what-are-embeddings/):** [nomic-embed-text-v1.5](https://www.nomic.ai/blog/posts/nomic-embed-text-v1)
4. **[LLM Framework](https://python.langchain.com/v0.2/docs/introduction/):** LangChain
5. **[Huggingface API Key](https://huggingface.co/settings/tokens)**

# Install required libraries

In [1]:
!pip install -q -U \
     Sentence-transformers==3.0.1 \
     langchain==0.3.19 \
     langchain-groq==0.2.4 \
     langchain-chroma==0.2.2 \
     langchain-community==0.3.18 \
     langchain-huggingface==0.1.2 \
     einops==0.8.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m46.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m42.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m9.4 MB/s[0m eta [36m0:00:

### Import related libraries related to Langchain, HuggingfaceEmbedding

In [2]:
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.document_loaders import WebBaseLoader



In [3]:
import os
import getpass

#### Provide a Groq API key. You can create one to access free open-source models at the following link.

[Groq API Creation Link](https://console.groq.com/keys)




In [4]:
os.environ["GROQ_API_KEY"] = getpass.getpass()

··········


# Provide Huggingface API Key. You can create Huggingface API key at following link

[Huggingface API Creation Link](https://huggingface.co/settings/tokens)




In [5]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


### Step 1: Load and preprocess data code

In [6]:
def load_and_process_data(url):
    # Load data from web
    loader = WebBaseLoader(url)
    data = loader.load()

    # Split text into chunks (Experiment with Chunk Size and Chunk Overlap to get optimal chunking)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(data)

    return chunks

### Step 2: Create multiple vector stores code

In [7]:
def create_vector_stores(chunks_list):
    embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})
    vector_stores = []
    for chunks in chunks_list:
        vector_store = Chroma.from_documents(chunks, embeddings)
        vector_stores.append(vector_store)
    return vector_stores

### Step 3: Mult-index-RAG related code

1. **Retrieval from Multiple Indices:** We retrieve relevant documents from each vector store.
2. **Context Combination:** We combine the retrieved documents from all sources into a single context.
3. **Response Generation:** Using the combined context, we generate a final response to the original query.
4. **Source Usage Explanation:** We generate an explanation of how information from different sources was used to answer the question.

In [8]:
def multi_index_rag(query, vector_stores, llm):
    # Retrieve documents from each vector store
    all_docs = []
    for i, vector_store in enumerate(vector_stores):
        docs = vector_store.similarity_search(query, k=3)  # Increased k to 3
        all_docs.extend([f"Source {i+1}: " + doc.page_content for doc in docs])

    # Combine retrieved documents
    context = "\n\n".join(all_docs)

    # Generate response using combined context
    response_prompt = ChatPromptTemplate.from_template(
        "You are an AI assistant tasked with answering questions based on the provided context. "
        "The context contains information from multiple sources related to AI, machine learning, and NLP. "
        "Please analyze the context carefully and provide a comprehensive answer to the question. "
        "If the context doesn't contain enough information, use your general knowledge to supplement the answer, "
        "but prioritize information from the context when available.\n\n"
        "Context:\n{context}\n\n"
        "Question: {query}\n"
        "Answer:"
    )
    response_chain = response_prompt | llm
    try:
        response = response_chain.invoke({"context": context, "query": query})
        final_answer = response.content
    except Exception as e:
        print(f"Error generating response: {e}")
        final_answer = "I apologize, but I encountered an error while generating the response."

    # Generate explanation of source usage
    explanation_prompt = ChatPromptTemplate.from_template(
        "Based on the answer you provided, explain how information from different sources was used to answer the question. "
        "If general knowledge was used to supplement the answer, mention that as well.\n\n"
        "Context: {context}\n"
        "Question: {query}\n"
        "Answer: {answer}\n"
        "Explanation:"
    )
    explanation_chain = explanation_prompt | llm
    try:
        explanation = explanation_chain.invoke({"context": context, "query": query, "answer": final_answer})
        source_explanation = explanation.content
    except Exception as e:
        print(f"Error generating explanation: {e}")
        source_explanation = "Unable to generate explanation due to an error."

    return {
        "final_answer": final_answer,
        "source_explanation": source_explanation,
        "retrieved_context": context
    }

### Step 4: Create chunk of web data to Chroma Vector Store

In [9]:
llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.5
)

# Load and process data
urls = [
        "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "https://en.wikipedia.org/wiki/Machine_learning",
        "https://en.wikipedia.org/wiki/Natural_language_processing"
      ]
chunks_list = [load_and_process_data(url) for url in urls]

# Create multiple vector stores
vector_stores = create_vector_stores(chunks_list)

  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/255 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/140 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/71.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

configuration_hf_nomic_bert.py:   0%|          | 0.00/1.96k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_hf_nomic_bert.py:   0%|          | 0.00/103k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

### Step 5: Run Multi-index RAG

This implementation shows the key parts of Multi-index RAG:

1. Use of multiple vector stores representing different sources or types of information
2. Retrieval and combination of information from multiple sources
3. Generation of a comprehensive response using the combined information
Explanation of how different sources contributed to the answer

In [10]:
# Example query
query = "How do AI, machine learning, and NLP relate to each other in modern applications?"

# Run Multi-index RAG
result = multi_index_rag(query, vector_stores, llm)

print("Query:", query)
print("\nFinal Answer:")
print(result["final_answer"])
print("\nSource Usage Explanation:")
print(result["source_explanation"])

Query: How do AI, machine learning, and NLP relate to each other in modern applications?

Final Answer:
Based on the provided context, AI, machine learning, and NLP are interconnected concepts that are closely related in modern applications. Here's a comprehensive overview of their relationships:

1. Machine Learning is a part of AI: According to the context, machine learning is the study of programs that can improve their performance on a given task automatically, and it has been a part of AI from the beginning.
2. NLP is a technology used in AI and machine learning: The context mentions that a Document AI platform sits on top of NLP technology, enabling users to quickly train a computer to extract specific data from documents. This implies that NLP is a technology used in AI and machine learning applications.
3. NLP is used in Document AI: The context highlights the use of NLP-powered Document AI platforms, which enable non-technical teams to quickly access information hidden in docu