Here is a step-by-step Markdown explanation for your notebook, describing each part and its purpose:

---

# Conversational RAG with LangChain, FAISS, and OpenAI: Step-by-Step Explanation

## 1. **Install Required Packages**



In [None]:
! pip install langchain faiss-cpu sentence-transformers openai tiktoken rouge-score nltk python-dotenv langchain-community langchain_openai rouge nltk

*Installs all necessary libraries for document loading, embeddings, vector search, LLMs, and environment management.*

---

## 2. **Load Environment Variables**



In [None]:
from dotenv import load_dotenv
load_dotenv()

*Loads API keys and other secrets from a `.env` file, so you don’t hardcode them in your notebook.*

---

## 3. **Load and Preprocess Documents**



In [None]:
from pathlib import Path
from langchain.schema import Document
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import FAISS
import re

def load_documents_from_folder(folder_path):
    """Load each .txt file in the folder as a separate Document."""
    txt_files = Path(folder_path).glob("*.txt")
    documents = []
    for file in txt_files:
        text = file.read_text(encoding="utf-8")
        clean_text = re.sub(r'\s+', ' ', text.strip())  # Clean and normalize
        doc = Document(page_content=clean_text, metadata={"source": file.name})
        documents.append(doc)
    return documents

# Load documents from folder
folder_path = "Dataset/ancient_greece_data"
documents = load_documents_from_folder(folder_path)

*Reads all `.txt` files from a folder, cleans the text, and wraps each as a LangChain `Document`.*

---

## 4. **Create Embeddings and Build FAISS Index**



In [None]:
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(documents, embedding_model)
vectorstore.save_local("faiss_index_ancient_greece")

*Converts documents to embeddings and stores them in a FAISS vector database for fast similarity search. Saves the index locally.*

---

## 5. **Load FAISS Index (for Reuse)**



In [None]:
vectorstore = FAISS.load_local("faiss_index_ancient_greece", embedding_model, allow_dangerous_deserialization=True)

*Loads the previously saved FAISS index for querying.*

---

## 6. **Create a Retriever**



In [None]:
retriever = vectorstore.as_retriever()

*Wraps the vectorstore as a retriever object for searching relevant documents.*

---

## 7. **Initialize the LLM**



In [None]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

*Sets up the OpenAI GPT-4o-mini model for answering questions.*

---

## 8. **Create a History-Aware Retriever**



In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that rewrites follow-up questions into standalone questions using chat history."),
        MessagesPlaceholder("chat_history"),
        ("human", """Given the above chat history and the latest user question below,
reformulate it into a standalone question. Do not answer the question.
If it's already standalone, return it as is.

Latest user question:
{input}"""),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

*Uses chat history to reformulate follow-up questions into standalone questions for better retrieval.*

---

## 9. **Create the RAG Prompt and Chain**



In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", (
            """You are an assistant for question-answering tasks.
            Answer this question using the provided context only.
            If you dont know the answer, just say 'I dont know'
            {context}"""
        )),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
contextual_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

*Defines how the LLM should answer using only retrieved context. Chains the retriever and LLM together.*

---

## 10. **Enable Conversational Memory**



In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    contextual_rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

*Adds chat history tracking for each session, enabling context-aware conversations.*

---

## 11. **Run the Conversational RAG Chain**



In [None]:
response = conversational_rag_chain.invoke(
    {"input": "who is socrates"},
    config={
        "configurable": {"session_id": "abc1235"}
    },
)
print(response["answer"])

*Asks a question and prints the answer, storing the conversation under a session ID.*

---

## 12. **Ask a Follow-up Question**



In [None]:
response = conversational_rag_chain.invoke(
    {"input": "where did he lived"},
    config={
        "configurable": {"session_id": "abc1235"}
    },
)
print(response["answer"])

*Asks a follow-up question. The system uses chat history to understand "he" refers to Socrates.*

---

## **Summary**

- **Load and preprocess documents** → **Embed and index with FAISS** → **Set up retriever and LLM** → **Enable conversational memory** → **Ask questions in context**.
- This workflow enables a chatbot to answer questions about your dataset, using retrieval-augmented generation and chat history for context.

---