Classic RAG (Dense Retrieval)

Plaintext:

Q: What are the library opening hours?
A: The library is open from 9am to 8pm on weekdays.

Q: How do I obtain my student ID?
A: Student IDs are issued at the administration desk.

In [1]:
!pip install langchain-community



In [2]:
!pip install faiss-cpu



In [4]:
import requests
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings.base import Embeddings
from langchain.llms.base import LLM
from typing import List, Any

# docs: List of tuples containing question-answer pairs.
# Each tuple represents a document with a question and its corresponding answer.
docs = [
    ("What are the library opening hours?", "The library is open from 9am to 8pm on weekdays."),
    ("How do I obtain my student ID?", "Student IDs are issued at the administration desk."),
]

# texts: List of formatted strings combining questions and answers.
# Each string is formatted as "Q: <question>\nA: <answer>" for embedding and retrieval.
texts = [f"Q: {q}\nA: {a}" for q, a in docs]

# CustomAPIEmbeddings: Custom embedding class using your /embedding API.
# - Inherits from LangChain's Embeddings base class.
# - embed_documents() embeds a list of texts by calling embed_query() for each.
# - embed_query() sends a POST request to your /embedding endpoint with the text and model version.
# - The API returns a dense vector (embedding) for the input text.
# - Example: The text "Q: What are the library opening hours?\nA: ..." is converted into a high-dimensional vector (e.g., 1536 dimensions).
class CustomAPIEmbeddings(Embeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(text) for text in texts]

    def embed_query(self, text: str) -> List[float]:
        response = requests.post(
            "http://localhost:8000/embedding",
            json={"input": text, "model_version": "text-embedding-ada-002"},
            timeout=10,
        )
        response.raise_for_status()
        return response.json()["embedding"]

# CustomAPILLM: Custom LLM class using your /chat API.
# - Inherits from LangChain's LLM base class.
# - _call() sends a POST request to your /chat endpoint with the system prompt, user prompt, and model version.
# - The API returns a generated answer in the "content" field.
# - Example: When a user asks "When can I go to the library?", the system retrieves the most relevant Q&A pair and uses the LLM to answer in context.
class CustomAPILLM(LLM):
    def _call(self, prompt: str, stop: List[str] = [], **kwargs: Any) -> str:
        payload = {
            "system_prompt": "You are a helpful assistant.",
            "user_prompt": prompt,
            "model_version": "gpt-4.1-2025-04-14"
        }
        response = requests.post(
            "http://localhost:8000/chat",
            json=payload,
            timeout=30,
        )
        response.raise_for_status()
        return response.json()["content"]

    @property
    def _llm_type(self) -> str:
        return "custom_api_llm"

# vectorstore: FAISS vector store instance.
# - FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors.
# - It is commonly used to quickly find similar items (e.g., documents, images) in large datasets.
# - from_texts() creates a vector store from the provided texts using the specified embedding model.
#   Example: If you have 1000 FAQ pairs, each will be converted to a vector and stored in FAISS for fast retrieval.
# - CustomAPIEmbeddings generates dense vector representations (embeddings) for each text using your custom embedding API.
# - Indexes: FAISS builds an index over these vectors, allowing fast nearest neighbor search. This index is not a traditional database index, but a structure (like an inverted file or HNSW graph) optimized for vector similarity.
# - Vector dimensions: Each embedding is a list (array) of numbers (floats), where the length (dimension) depends on the embedding model (e.g., 1536 for ada-002). Higher dimensions can capture more semantic information.
# - Why arrays? Vectors are arrays because mathematical operations (like dot product or cosine similarity) are performed on them to measure similarity.
vectorstore = FAISS.from_texts(texts, CustomAPIEmbeddings())

# qa: RetrievalQA chain instance.
# - RetrievalQA is a LangChain chain that combines a retriever and a language model (LLM) for question answering.
# - from_chain_type() initializes the chain with:
#     - llm: The language model to generate answers (here, your custom LLM).
#     - retriever: The retriever interface from the vector store, used to fetch relevant documents.
# - Example: When a user asks "When can I go to the library?", the retriever converts the question to a vector, finds the most similar vectors (documents) in FAISS, and passes them to the LLM to generate a final answer.
qa = RetrievalQA.from_chain_type(
    llm=CustomAPILLM(), retriever=vectorstore.as_retriever()
)

# Run the QA chain with a user query.
# - qa.run() takes a question as input, retrieves relevant documents, and generates an answer using the LLM.
# - Example: If the input is "When can I go to the library?", the system retrieves the most relevant Q&A pair and uses the LLM to answer in context.
print(qa.run("When can I go to the library?"))

You can go to the library on weekdays between 9am and 8pm.
