## Lesson 3: Adding RAG

Lesson objective: Add a document database to a workflow

In this lab, you’ll parse a resume and load it into a vector store, and use the agent to run basic queries against the documents. You’ll use LlamaParse to parse the documents.


In [1]:
import os

from dotenv import load_dotenv
from llama_index.utils.workflow import draw_all_possible_flows

In [2]:
load_dotenv()

llama_cloud_api_key = os.environ["LLAMA_CLOUD_API_KEY"]
openai_api_key = os.environ["OPENAI_API_KEY"]

In [3]:
# You need nested async for this to work, so let's enable it here. It allows you to nest asyncio event loops within each other.

import nest_asyncio

nest_asyncio.apply()

### Performing Retrieval-Augmented Generation (RAG) on a Resume Document


#### 1. Parsing the Resume Document


Using LLamaParse, you will transform the resume into a list of Document objects. By default, a Document object stores text along with some other attributes:

- metadata: a dictionary of annotations that can be appended to the text.
- relationships: a dictionary containing relationships to other Documents.

You can tell LlamaParse what kind of document it's parsing, so that it will parse the contents more intelligently. In this case, you tell it that it's reading a resume.


In [4]:
from llama_parse import LlamaParse, ResultType

In [5]:
documents = LlamaParse(
    api_key=llama_cloud_api_key,
    result_type=ResultType.MD,
    user_prompt="This is a resume, gather related facts together and format it as bullet points with headers",
).load_data(
    "./data/fake_resume.pdf",
)

Started parsing the file under job_id e0ca9ee3-7ab0-4ec7-a6e2-f580f709b8df


In [6]:
print(documents[2].text)

# Projects

# EcoTrack | GitHub

- Built full-stack application for tracking carbon footprint using React, Node.js, and MongoDB
- Implemented machine learning algorithm for providing personalized sustainability recommendations
- Featured in TechCrunch's "Top 10 Environmental Impact Apps of 2023"

# ChatFlow | Demo

- Developed real-time chat application using WebSocket protocol and React
- Implemented end-to-end encryption and message persistence
- Serves 5000+ monthly active users

# Certifications

- AWS Certified Solutions Architect (2023)
- Google Cloud Professional Developer (2022)
- MongoDB Certified Developer (2021)

# Languages

- English (Native)
- Mandarin Chinese (Fluent)
- Spanish (Intermediate)

# Interests

- Open source contribution
- Tech blogging (15K+ Medium followers)
- Hackathon mentoring
- Rock climbing


#### 2. Creating a Vector Store Index


The `VectorStoreIndex` will return an index object, which is a data structure that allows you to quickly retrieve relevant context for your query. It's the core foundation for RAG use-cases. You can use indexes to build Query Engines and Chat Engines which enables question & answer and chat over your data.


In [7]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding

In [8]:
index = VectorStoreIndex.from_documents(
    documents=documents,
    embed_model=OpenAIEmbedding(
        model_name="Cohere-embed-v3-english",
        api_key=openai_api_key,
        api_base=os.environ["OPENAI_API_BASE"],
    ),
)

#### 3. Creating a Query Engine with the Index


In [9]:
from llama_index.llms.openai import OpenAI

In [None]:
llm = OpenAI(
    model="gpt-4.1-nano",
    api_key=openai_api_key,
    api_base=os.environ["OPENAI_API_BASE"],
    temperature=0.5,
)

In [11]:
query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)

response = query_engine.query(
    "What is this person's name and what was their most recent job?"
)

print(response)

This person's name is Sarah Chen, and their most recent job was Senior Full Stack Developer at TechFlow Solutions.


#### 4. Storing the Index to Disk


In [12]:
storage_dir = "./storage"

index.storage_context.persist(persist_dir=storage_dir)

You can check if your index has already been stored,
and if it has, you can reload an index from disk using the load_index_from_storage method, like this:


In [13]:
from llama_index.core import StorageContext, load_index_from_storage

In [None]:
# Check if the index is stored on disk
if os.path.exists(storage_dir):
    # Load the index from disk
    storage_context = StorageContext.from_defaults(persist_dir=storage_dir)

    restored_index = load_index_from_storage(
        storage_context=storage_context,
        embed_model=OpenAIEmbedding(
            model_name="Cohere-embed-v3-english",
            api_key=openai_api_key,
            api_base=os.environ["OPENAI_API_BASE"],
        ),
    )
else:
    print("Index not found on disk.")

Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage/index_store.json.


In [28]:
response = restored_index.as_query_engine(llm=llm).query(
    "What is this person's name and what was their most recent job?"
)

print(response)

The person's name is Sarah Chen, and their most recent job was Senior Full Stack Developer at TechFlow Solutions in San Francisco, CA.


### Making RAG Agentic


In [None]:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.core.tools import FunctionTool

In [16]:
def query_resume(q: str) -> str:
    """Answers questions about a specific resume."""

    # we're using the query engine we already created above
    response = query_engine.query(
        f"This is a question about the specific resume we have in our database: {q}"
    )

    return response.response

In [18]:
resume_tool = FunctionTool.from_defaults(fn=query_resume)

In [None]:
agent = FunctionAgent(
    system_prompt="You are an agent capable of using tools to complete tasks",
    tools=[resume_tool],
    llm=llm,
    timeout=30,
    verbose=True,
)

In [20]:
response = await agent.run(
    user_msg="How many years of experience does the applicant have?"
)

print(response)

Running step init_run
Step init_run produced event AgentInput
Running step setup_agent
Step setup_agent produced event AgentSetup
Running step run_agent_step
Step run_agent_step produced event AgentOutput
Running step parse_agent_output
Step parse_agent_output produced no event
Running step call_tool
Step call_tool produced event ToolCallResult
Running step aggregate_tool_results
Step aggregate_tool_results produced event AgentInput
Running step setup_agent
Step setup_agent produced event AgentSetup
Running step run_agent_step
Step run_agent_step produced event AgentOutput
Running step parse_agent_output
Step parse_agent_output produced event StopEvent
The applicant has over 6 years of experience.


### Wrapping the Agentic RAG into a Workflow


In [21]:
from llama_index.core.workflow import (
    Context,
    Event,
    StartEvent,
    StopEvent,
    Workflow,
    step,
)

In [22]:
class QueryEvent(Event):
    query: str

In [23]:
from llama_index.core.base.base_query_engine import BaseQueryEngine

In [None]:
class RAGWorkflow(Workflow):
    storage_dir: str = "./storage"
    llm: OpenAI
    query_engine: BaseQueryEngine

    # the first step will be setup
    @step
    async def set_up(self, ctx: Context, ev: StartEvent) -> QueryEvent:
        if not ev.resume_file:
            raise ValueError("No resume file provided")

        # define an LLM to work with
        self.llm = OpenAI(
            model="gpt-4.1-mini",
            api_key=openai_api_key,
            api_base=os.environ["OPENAI_API_BASE"],
            temperature=0.5,
        )

        # ingest the data and set up the query engine
        if os.path.exists(self.storage_dir):
            # you've already ingested your documents
            storage_context = StorageContext.from_defaults(persist_dir=self.storage_dir)

            index = load_index_from_storage(
                storage_context=storage_context,
                embed_model=OpenAIEmbedding(
                    model_name="Cohere-embed-v3-english",
                    api_key=openai_api_key,
                    api_base=os.environ["OPENAI_API_BASE"],
                ),
            )
        else:
            # parse and load your documents
            documents = LlamaParse(
                api_key=llama_cloud_api_key,
                result_type=ResultType.MD,
                user_prompt=(
                    "This is a resume, gather related facts together and format it as bullet points with headers"
                ),
            ).load_data(ev.resume_file)

            # embed and index the documents
            index = VectorStoreIndex.from_documents(
                documents,
                embed_model=OpenAIEmbedding(
                    model_name="Cohere-embed-v3-english",
                    api_key=openai_api_key,
                    api_base=os.environ["OPENAI_API_BASE"],
                ),
            )

            index.storage_context.persist(persist_dir=self.storage_dir)

        # either way, create a query engine
        self.query_engine = index.as_query_engine(llm=self.llm, similarity_top_k=5)

        # now fire off a query event to trigger the next step
        return QueryEvent(query=ev.query)

    # the second step will be to ask a question and return a result immediately
    @step
    async def ask_question(self, ctx: Context, ev: QueryEvent) -> StopEvent:
        response = self.query_engine.query(
            f"This is a question about the specific resume we have in our database: {ev.query}"
        )

        return StopEvent(result=response.response)

In [30]:
workflow = RAGWorkflow(timeout=120, verbose=False)

result = await workflow.run(
    resume_file="./data/fake_resume.pdf",
    query="Where is the first place the applicant worked?",
)

print("\n-=-=-=- Workflow Result -=-=-=-\n")
print(result)

Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./storage/index_store.json.

-=-=-=- Workflow Result -=-=-=-

The first place the applicant worked was StartupHub in San Jose, CA as a Junior Web Developer.


If you're particularly suspicious, you might notice there's a small bug here: if you run this a second time, with a new resume, this code will find the old resume and not bother to parse it. You don't need to fix that now, but think about how you might fix that.


### Workflow Visualization


In [31]:
WORKFLOW_FILE = "./workflows/rag_workflow.html"
draw_all_possible_flows(workflow, filename=WORKFLOW_FILE)

./workflows/rag_workflow.html
