## Key Libraries Deep Dive

| Library | Purpose | Why Chosen |
| :-- | :-- | :-- |
| **langchain_community** | Connectors for data sources | Standardized document handling |
| **langchain_text_splitters** | Text processing | Context-aware chunking |
| **sentence-transformers** | Semantic embeddings | Open-source SOTA models |
| **FAISS** | Vector storage | Optimized similarity search |
| **langchain_core** | Pipeline construction | Modular architecture |
| **ChatPerplexity** | LLM interface | Commercial-grade performance |

# Workflow Summary
1. **Ingest** content from web sources
2. **Process** documents into manageable chunks
3. **Encode** text into numerical representations
4. **Store** vectors for efficient retrieval
5. **Retrieve** relevant context for queries
6. **Generate** answers using LLM with injected context

# Environment Setup
- **os**: Manages environment variables for secure credential handling
- **Security Note**: Always keep API keys in secure storage (never hardcode in production)

In [1]:
import os
os.environ["OPENAI_API_KEY"] = "XXXXXXX"  # Authentication for API services - Fill the api key
os.environ["USER_AGENT"] = "Learning RAG"  # Identifies requests to web servers

# Document Ingestion
- **WebBaseLoader**: Specialized web scraper that preserves metadata
- **Alternative loaders**: Available for PDFs, CSVs, etc.

In [2]:
from langchain_community.document_loaders import WebBaseLoader

urls = ["https://en.wikipedia.org/wiki/Retrieval-augmented_generation"]
loader = WebBaseLoader(urls)  # Web content fetcher
documents = loader.load()  # Returns list of Document objects

# Text Chunking
- **RecursiveCharacterTextSplitter**: Maintains semantic structure better than simple splitting
- Chunk size affects retrieval quality - larger chunks capture more context

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # Optimal for balance between context and precision
    chunk_overlap=200  # Maintains context continuity between chunks
)
splits = text_splitter.split_documents(documents)

# Vector Embedding & Storage
- **HuggingFaceEmbeddings**: Converts text to numerical representations
- **FAISS**: Facebook's library for fast similarity searches (lower memory footprint)

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embedding = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"  # State-of-the-art sentence embeddings
)
vectorstore = FAISS.from_documents(splits, embedding)  # Efficient similarity search

# RAG Chain Construction
- **ChatPromptTemplate**: Manages LLM instruction formatting
- **RunnablePassthrough**: Directly pipes user questions to prompt
- **Temperature** 0.7 allows creative but focused responses

In [5]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_perplexity import ChatPerplexity
from langchain_core.runnables import RunnablePassthrough

template = """Answer using ONLY these context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)  # Structured prompt engineering

retriever = vectorstore.as_retriever()  # Creates search interface

llm = ChatPerplexity(
    model="sonar-pro",  # High-performance commercial LLM
    temperature=0.7  # Balances creativity vs factual accuracy
)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()} 
    | prompt 
    | llm
)

# Query Execution
- Invocation pattern matches LangChain's standard interface

In [6]:
response = rag_chain.invoke("Explain RAG architecture")
print(response.content)  # Displays formatted answer

## RAG Architecture Explained

Retrieval-Augmented Generation (RAG) is an architecture that enhances large language models (LLMs) by integrating an information retrieval component before generating responses. This design allows LLMs to access up-to-date, relevant information from external sources, improving factual accuracy and reducing errors known as hallucinations[4][5][cb86e46a-1831-467e-9092-75d961710bd5].

**Key Components and Stages**

- **Indexing**  
  Data (usually unstructured text, but also semi-structured or structured data) is first processed and converted into numerical representations called embeddings. These embeddings map content into a large vector space, capturing semantic meaning. The embeddings are stored in a vector database, allowing efficient retrieval of relevant information. The data is typically split into smaller chunks (such as sentences or paragraphs) before embedding to improve retrieval granularity[3][5][815dbf86-882d-426f-bc6a-b45411acec55].

- **Retri