### Setup env and mlflow

In [1]:
import mlflow
from dotenv import load_dotenv

load_dotenv()

# Optional: Set an experiment to organize your traces
mlflow.set_experiment("RAG Agent Tutor")

# Enable tracing
mlflow.langchain.autolog()  # type: ignore

2026/01/02 18:08:28 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/01/02 18:08:28 INFO mlflow.store.db.utils: Updating database tables
2026/01/02 18:08:28 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/01/02 18:08:28 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2026/01/02 18:08:28 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/01/02 18:08:28 INFO alembic.runtime.migration: Will assume non-transactional DDL.


### Init model

In [3]:
from langchain.chat_models import init_chat_model
from os import getenv
from dotenv import load_dotenv

load_dotenv()

model = init_chat_model(
    model="xiaomi/mimo-v2-flash:free",
    model_provider="openai",
    base_url="https://openrouter.ai/api/v1",
    api_key=getenv("OPENROUTER_API_KEY"),
)

# Example usage
response = model.invoke("What NFL team won the Super Bowl in the year Justin Bieber was born?")
print(response.content)

Justin Bieber was born on **March 1, 1994**. The Super Bowl played in the calendar year 1994 was **Super Bowl XXVIII**.

Here are the details:
*   **Champion:** **Dallas Cowboys**
*   **Opponent:** Buffalo Bills
*   **Score:** 30–13
*   **Date:** January 30, 1994


In [None]:
import getpass
import os

from pydantic import SecretStr

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_localai import LocalAIEmbeddings

embeddings = LocalAIEmbeddings(
    openai_api_base="https://openrouter.ai/api/v1", 
    model="openai/text-embedding-3-small", 
    openai_api_key=getenv("OPENROUTER_API_KEY"), 
    # headers={"Authorization": f"Bearer {getenv('OPENROUTER_API_KEY')}"},
    
)

In [37]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

from bs4.filter import SoupStrainer
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")


from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Total characters: 43047
Split blog post into 63 sub-documents.


In [38]:
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])

['a3488f1b-8ae6-45db-97da-59817aaa599f', '20434d42-9991-4e60-9621-6133f5599316', '5c4718dd-0bc3-40ad-8fcb-8d9d2432aaf9']


### RAG Agent

In [28]:
from langchain.tools import tool


@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

In [29]:
from langchain.agents import create_agent


tools = [retrieve_context]
# If desired, specify custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

In [30]:
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()


What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.

First, I need to understand what the user is asking. The question is: "What is the standard method for Task Decomposition?" and then "Once you get the answer, look up common extensions of that method."
So, Task Decomposition is a term I've heard before, but I'm not entirely sure what it specifically refers to in different contexts. It could be related to project management, computer science, or perhaps even cognitive psychology. I think the most common context is in project management or software development, where tasks are broken down into smaller, more manageable parts.
But to be precise, I should probably look this up to make sure I have the correct definition and standard method.
Let me think about how to approach this. Since the user mentioned a blog post, I should use the retrieve_context tool to find relevant information about Task Decomposition.
Wait, actua

In [31]:
from langchain.agents.middleware import dynamic_prompt, ModelRequest


@dynamic_prompt
def prompt_with_context(request: ModelRequest) -> str:
    """Inject context into state messages."""
    last_query = request.state["messages"][-1].text
    retrieved_docs = vector_store.similarity_search(last_query)

    docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

    system_message = (
        "You are a helpful assistant. Use the following context in your response:"
        f"\n\n{docs_content}"
    )

    return system_message


agent = create_agent(model, tools=[], middleware=[prompt_with_context])

In [32]:
query = "What is task decomposition?"
for step in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()


What is task decomposition?

Based on the provided context, **task decomposition** is the process of breaking down a large, complicated task into smaller, more manageable sub-tasks or steps.

Here are the three main ways task decomposition can be done:

1.  **By LLM with Simple Prompting:** This involves instructing the model to break down the task itself, using prompts like "Steps for XYZ. 1." or "What are the subgoals for achieving XYZ?"
2.  **Using Task-Specific Instructions:** This utilizes specific guidelines for a particular domain, such as "Write a story outline" as a sub-task for writing a novel.
3.  **With Human Inputs:** Relying on human guidance to define the steps.

Additionally, the text highlights specific techniques for handling complex tasks:

*   **Chain of Thought (CoT):** A standard prompting technique where the model is instructed to "think step by step" to decompose hard tasks into simpler steps, utilizing more computation time to solve the problem.
*   **Tree of 