# Basic RAG test

## Environment Setup
This section sets up environment variables for API endpoints and keys.

In [5]:
from dotenv import load_dotenv
load_dotenv()

import os

os.environ['LLM_ENDPOINT'] = os.getenv('LLM_ENDPOINT')

os.environ['LANGCHAIN_ENDPOINT'] = os.getenv('LANGCHAIN_ENDPOINT')
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGCHAIN_API_KEY')



## Load and Parse Web Page
This section loads the web page and parses its content using BeautifulSoup and LangChain's WebBaseLoader.

In [6]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


## Split Documents into Chunks
This section splits the loaded documents into manageable text chunks for embedding and retrieval.

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

splits = text_splitter.split_documents(docs)

## Create Vector Store and Generate Embeddings
This section creates a Chroma vector store and generates embeddings for the text chunks using Ollama.

In [8]:
from langchain_community.vectorstores import Chroma
from langchain_ollama.embeddings import OllamaEmbeddings

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OllamaEmbeddings(model="qwen3:14b")
)

## Cleanup Vector Store
This section deletes the Chroma vector store collection to free up resources.

In [None]:
#vectorstore.delete_collection()

## Retrieve and Print Document
This section retrieves a document relevant to the query and prints its content.

In [9]:
retriever = vectorstore.as_retriever()

docs = retriever.invoke("What is Task Decomposition?")

print(docs[0].page_content)

Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.
Self-Reflection#


## Retrieve Prompt from LangChain Hub
This section pulls a prompt from the LangChain hub and prints it.

In [10]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

print(prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]


## Initialize Ollama LLM
This section initializes the Ollama language model (LLM) with the specified model and temperature.

In [11]:
from langchain_ollama import ChatOllama

llm = ChatOllama(model="qwen3:14b", temperature=0)

## Build RAG Pipeline
This section defines the output parser, document formatting function, and constructs the Retrieval-Augmented Generation (RAG) chain.

In [12]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
     | prompt
     | llm
     | StrOutputParser()
)

## Run RAG Chain and Display Response
This section invokes the RAG chain with a sample question and prints the generated response.

In [None]:
response = rag_chain.invoke("What is Task Decomposition?")
print(response)

# Multi_Query Generation test

## Prompt Generation and Chain

A prompt template is created and a chain for alternate query generation is defined

In [None]:
from langchain.prompts import ChatPromptTemplate

template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question} /no_think"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_perspectives
    | ChatOllama(
        model="qwen3:14B",
        temperature=0
    )
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

## Generate and Clean Alternative Queries

This section generates multiple alternative versions of a user question using an LLM, cleans the output to remove any extraneous tokens, and displays the resulting queries. These alternative queries help improve document retrieval by providing diverse perspectives for similarity search.

In [48]:
question = "What is Task Decomposition for LLM agents?"
generated_queries_list = generate_queries.invoke({"question": question})

def clean_llm_output(queries):
    first = 0
    last = 0
    for i, q in enumerate(queries):
        if "<think>" in q:
            first = i
        if first >= 0 and "</think>" in q:
            last = i+2

    return queries[last:]

cleaned_generated_queries_list = clean_llm_output(generated_queries_list)

for i, q in enumerate(cleaned_generated_queries_list):
    print(f"Query {i+1}: {q}")

Query 1: What is the process of breaking down complex tasks into smaller subtasks for LLM agents?  
Query 2: How do LLM agents utilize task decomposition to improve performance on complex problems?  
Query 3: Can you explain the concept of task decomposition in the context of large language model agents?  
Query 4: What role does task decomposition play in enabling LLM agents to handle multi-step tasks?  
Query 5: In what ways is task decomposition applied to enhance the coordination of LLM agents in collaborative settings?


## Unique Document Retrieval Chain

This section defines a retrieval chain that generates multiple alternative queries for a user question, retrieves relevant documents for each query, and returns the unique union of all retrieved documents. This approach improves retrieval diversity and reduces redundancy by combining results from different query perspectives.

In [49]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ A simple function to get the unique union of retrieved documents """
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

retrieval_chain = generate_queries | clean_llm_output | retriever.map() | get_unique_union

docs = retrieval_chain.invoke({"question": question})
print(f"Retrieved {len(docs)} unique documents.")

Retrieved 8 unique documents.


## Multi-Query RAG Chain Execution

This section constructs and executes a Retrieval-Augmented Generation (RAG) chain that leverages multiple alternative queries for improved document retrieval. It combines the results from diverse query perspectives and generates a comprehensive answer to the user question based on the aggregated context.

In [51]:
from operator import itemgetter

template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

multi_query_rag_chain = (
    {"context": retrieval_chain, "question": itemgetter("question")}
    | prompt
    | llm
    | StrOutputParser()
)

mqrc_output = multi_query_rag_chain.invoke({"question": question})

print(mqrc_output)

<think>
Okay, I need to figure out what Task Decomposition means for LLM agents based on the provided context. Let me start by going through the documents again to pick up relevant information.

The first document mentions that the AI assistant can parse user input into several tasks, each with attributes like task, ID, dependencies, and arguments. The "dep" field indicates dependencies on previous tasks. It also says tasks must be selected from an available list and that there's a logical order. If parsing isn't possible, return empty JSON. This seems to relate to breaking down a user's request into multiple tasks with dependencies.

Another document talks about HuggingGPT's four stages, with the first being Task planning where an LLM acts as the brain to parse user requests into multiple tasks. Each task has attributes: type, ID, dependencies, and arguments. They use few-shot examples for task parsing. This sounds like task decomposition, breaking down the user's input into structure