In [2]:
# Example RAG (Retrieval-Augmented Generation) workflow with Langchain and OpenAI

import os
from dotenv import load_dotenv
load_dotenv()

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not set in .env file. Please add it before running this notebook.")

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# --- Setup a simple retriever for demonstration ---
# In practice, you would build your vectorstore from your own data
# Here, we use a dummy in-memory FAISS vectorstore for illustration

# Example documents
example_docs = [
    Document(page_content="An LLM-powered agent system includes a language model, a memory component, a planning module, and a tool integration layer."),
    Document(page_content="The main components are: LLM, memory, tools, and an orchestrator."),
    Document(page_content="Autonomous agents use LLMs, memory, planning, and external tools.")
]

# Create embeddings and vectorstore
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(example_docs, embeddings)
retriever = vectorstore.as_retriever()

# Define a system prompt that tells the model how to use the retrieved context
system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\nContext: {context}"""

# Define a question
question = "What are the main components of an LLM-powered autonomous agent system?"

# Retrieve relevant documents
docs = retriever.invoke(question)

# Combine the documents into a single string
docs_text = "\n".join(d.page_content for d in docs) if isinstance(docs, list) else docs.page_content

# Populate the system prompt with the retrieved context
system_prompt_fmt = system_prompt.format(context=docs_text)

# Create a model
model = ChatOpenAI(model="gpt-4o", temperature=0)

# Generate a response
response = model.invoke([
    SystemMessage(content=system_prompt_fmt),
    HumanMessage(content=question)
])

print(response.content)

The main components of an LLM-powered autonomous agent system are a language model (LLM), a memory component, a planning module, and a tool integration layer.


## What is the `Document` structure in Langchain?

The `Document` class in Langchain is a simple data structure used to represent a piece of text (the document) along with optional metadata.

**Typical usage:**

```python
from langchain_core.documents import Document

doc = Document(
    page_content="This is the main text of the document.",
    metadata={"source": "example", "author": "John Doe"}  # metadata is optional
)
```

- `page_content` (**str**, required): The main text content of the document.
- `metadata` (**dict**, optional): Any additional information you want to associate with the document (e.g., source, author, tags).

In this notebook, only the `page_content` field is used, which is sufficient for most retrieval and RAG workflows.