# Langchain RAG

A typical RAG application has two main components:

**Indexing**: a pipeline for ingesting data from a source and indexing it. This usually happens offline.

**Retrieval and generation**: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

The most common full sequence from raw data to answer looks like:

### Indexing
1. **Load**: First we need to load our data. This is done in langchain with Document Loaders classes.
2. **Split**: Text splitters break large Documents into smaller chunks. This is necessary because embedding models have a finite context window.
3. **Embed**: Then we need to convert those chunks into vectors. This is done with an embedding model.
4. **Store**: We need somewhere to store and index our vectors from the text chunks, so that we can search over them later. This is done using a VectorStore.

### Retrieval and generation
5. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a Retriever.
6. **Generate**: A LLM produces an answer using a prompt that includes both the question and the retrieved data

![Indexing pipeline](./indexing-pipeline.jpg)


Prepare to import your API key

In [7]:
pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [8]:
#pip install langchain
!pip install -U langchain-openai



In [None]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv(os.path.expanduser("~/Projekte/MOOC/OpenCampus/codespace/.env"))

if not os.environ.get("OPENAI_API_KEY"):
  # os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
  raise EnvironmentError("OPENAI_API_KEY not found in the .env file.")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

In [10]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

In [11]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

In this guide we’ll build an app that answers questions about the website's content. The specific website we will use is the [LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilian Weng, which allows us to ask questions about the contents of the post.

We can create a simple indexing pipeline and RAG chain to do this in ~50 lines of code.

In [13]:
!pip install langchain-text-splitters langchain-community



 Load & Split Your Documents

In [15]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [16]:
# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

Embed & Store Vectors

In [18]:
# Index chunks
_ = vector_store.add_documents(documents=all_splits)

In [19]:
retriever = vector_store.as_retriever(
    search_type = "similarity",     # "mmr" or "similarity_score_threshold" also work
    search_kwargs = {"k": 4}
)


In [20]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain.prompts import ChatPromptTemplate

SYSTEM = """You are an expert assistant.
Answer *only* from the context between <context></context>;
if the answer isn’t there, say “I don't know.”"""
USER = """<context>\n{context}\n</context>\n\nQuestion: {input}"""

prompt = ChatPromptTemplate.from_messages([("system", SYSTEM), ("user", USER)])

# NEW: wrap llm + prompt in a "stuff-documents" chain → Runnable
combine_docs_chain = create_stuff_documents_chain(llm, prompt)  

# Build the final RAG runnable
rag_chain = create_retrieval_chain(retriever, combine_docs_chain) 


In [21]:
question = "What is Task Decomposition?"
result   = rag_chain.invoke({"input": question})

print(result["answer"])      # grounded answer
print(result["context"])     # the stuffed context string (if you need sources, see below)


Task Decomposition is a process where a complicated task is broken down into smaller and simpler steps. This can be achieved through methods such as Chain of Thought (CoT), which instructs the model to “think step by step” to decompose hard tasks, or via Tree of Thoughts, which explores multiple reasoning possibilities at each step by generating multiple thoughts per step, creating a tree structure. Task decomposition can be performed by the LLM using simple prompts, task-specific instructions, or with human inputs.
[Document(id='a7201102-3297-4187-9676-858646ae206c', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex 