# 🔗 LangChain + LlamaIndex Integration for RAG Pipelines

This notebook demonstrates a **detailed integration of LangChain and LlamaIndex** for building a production-grade RAG pipeline.

## Key Features:
- Document loading & chunking
- Embedding generation
- Indexing with FAISS (via LangChain or LlamaIndex)
- Query-time retrieval and LLM response generation

📦 Required packages: `langchain`, `llama-index`, `faiss-cpu`, `openai`, `tiktoken`, `docx2txt`, `pypdf`, `python-dotenv`

In [None]:
# !pip install langchain llama-index faiss-cpu openai tiktoken docx2txt pypdf python-dotenv

## 🔐 1. Setup API Keys

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

## 📄 2. Load Documents (PDF, DOCX, or TXT)

In [None]:
from langchain.document_loaders import TextLoader, PyPDFLoader, UnstructuredWordDocumentLoader

def load_documents(file_path):
    if file_path.endswith(".pdf"):
        loader = PyPDFLoader(file_path)
    elif file_path.endswith(".docx"):
        loader = UnstructuredWordDocumentLoader(file_path)
    else:
        loader = TextLoader(file_path)
    return loader.load()

## ✂️ 3. Chunk Documents Using LangChain Text Splitter

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_documents(documents, chunk_size=500, chunk_overlap=50):
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return splitter.split_documents(documents)

## 🔍 4. Create Vector Store with FAISS + OpenAI Embeddings (LangChain)

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

def build_faiss_index(docs):
    embeddings = OpenAIEmbeddings()
    db = FAISS.from_documents(docs, embeddings)
    return db

## 💬 5. Query Using LangChain Retriever + LLM

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

def query_langchain_retrieval(db, query):
    retriever = db.as_retriever()
    llm = ChatOpenAI(model_name="gpt-4", temperature=0)
    qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
    return qa.run(query)

## 🦙 6. Optionally Use LlamaIndex for Same Documents

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.faiss import FaissVectorStore

def llamaindex_workflow(folder_path):
    reader = SimpleDirectoryReader(folder_path)
    docs = reader.load_data()
    embed_model = OpenAIEmbedding()
    index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)
    query_engine = index.as_query_engine()
    return query_engine.query

## ✅ End-to-End Example (LangChain)
Load → Chunk → Index → Query

In [None]:
# file_path = "./sample.pdf"  # Replace with your own
# docs = load_documents(file_path)
# split_docs = split_documents(docs)
# db = build_faiss_index(split_docs)
# response = query_langchain_retrieval(db, "What is the purpose of the agreement?")
# print(response)