## Adding RAG
Add a document database to a workflow.
Parse a resume and load it into a vector store, and use the agent to run basic queries against the documents. 
Use LlamaParse to parse the documents. LlamaParse is an advanced document parser that can read PDFs, Word files, Powerpoints, Excel spreadsheets, and extract information out of complicated PDFs into a form LLMs find easy to understand.

## Importing Libraries

In [1]:
from IPython.display import display, HTML
from helper import extract_html_content
from llama_index.utils.workflow import draw_all_possible_flows
import os

In [2]:
import nest_asyncio
nest_asyncio.apply()

We need two API keys: 
- Azure OpenAI
- LlamaCloud API key to use LlamaParse to parse the PDFs


In [3]:
import os 
import yaml
import json
from dotenv import load_dotenv
load_dotenv('../../.env')

True

In [6]:
llama_cloud_api_key = os.getenv("LLAMA_CLOUD_API_KEY")

## Performing Retrieval-Augmented Generation (RAG) on a Resume Document

### 1. Parsing the Resume Document 

Using LLamaParse, we transform the resume into a list of Document objects. By default, a Document object stores text along with some other attributes:
- metadata: a dictionary of annotations that can be appended to the text.
- relationships: a dictionary containing relationships to other Documents.
  
We can tell LlamaParse what kind of document it's parsing, so that it will parse the contents more intelligently. In this case, we tell it that it's reading a resume.

In [7]:
from llama_parse import LlamaParse

In [6]:
import certifi
certifi.where()

'C:\\Temp\\openai\\gen-ai-sd\\llamaindex\\.venv\\lib\\site-packages\\certifi\\cacert.pem'

In [8]:
documents = LlamaParse(
    api_key=llama_cloud_api_key,
    base_url=os.getenv("LLAMA_CLOUD_BASE_URL"),
    result_type="markdown",
    content_guideline_instruction="This is a resume, gather related facts together and format it as bullet points with headers"
).load_data(
    "data/fake_resume.pdf",
)

DEFAULT_BASE_URL:https://api.cloud.llamaindex.ai
base_url: /api/parsing/upload
Started parsing the file under job_id 3587d1cd-1be7-4b6c-96ba-dd8621470a9f


This gives you a list of Document objects you can feed to a VectorStoreIndex.

In [9]:
print(documents[2].text)

# Projects

# EcoTrack | GitHub

- Built full-stack application for tracking carbon footprint using React, Node.js, and MongoDB
- Implemented machine learning algorithm for providing personalized sustainability recommendations
- Featured in TechCrunch's "Top 10 Environmental Impact Apps of 2023"

# ChatFlow | Demo

- Developed real-time chat application using WebSocket protocol and React
- Implemented end-to-end encryption and message persistence
- Serves 5000+ monthly active users

# Certifications

- AWS Certified Solutions Architect (2023)
- Google Cloud Professional Developer (2022)
- MongoDB Certified Developer (2021)

# Languages

- English (Native)
- Mandarin Chinese (Fluent)
- Spanish (Intermediate)

# Interests

- Open source contribution
- Tech blogging (15K+ Medium followers)
- Hackathon mentoring
- Rock climbing


### 2. Create a Vector Store Index
We feed the Document objects to `VectorStoreIndex`. `VectorStoreIndex` will use an embedding model to embed the text. 

In [45]:
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import VectorStoreIndex

In [46]:
embedding_model = AzureOpenAIEmbedding(model_name="text-embedding-ada-002")
llm = AzureOpenAI(engine="gpt-4", model="gpt-4")

### 3. Creating a Query Engine with the Index
With an index, we can create a query engine and ask questions using an LLM

In [47]:
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embedding_model
)

In [42]:
query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)
response = query_engine.query("What is this person's name and what was their most recent job?")
print(response)

This person's name is Sarah Chen, and her most recent job is Senior Full Stack Developer at TechFlow Solutions in San Francisco, CA.


### 4. Storing the Index to Disk
In a production setting, you would probably use a hosted vector store of some kind.

In [32]:
storage_dir = "./storage"  ## llama-index automatically creates the directory

index.storage_context.persist(persist_dir=storage_dir)

In [36]:
from llama_index.core import StorageContext, load_index_from_storage

## Reload the index from disk using the `load_index_from_storage` method

# Check if the index is stored on disk
if os.path.exists(storage_dir):
    # Load the index from disk
    storage_context = StorageContext.from_defaults(persist_dir=storage_dir)
    restored_index = load_index_from_storage(storage_context)
else:
    print("Index not found on disk.")

In [48]:
restored_query_engine = restored_index.as_query_engine(llm=llm, embed_model=embedding_model, similarity_top_k=5)

In [49]:
response = restored_query_engine.query("What is this person's name and what was their most recent job?")
print(response)

This person's name is Sarah Chen, and her most recent job is Senior Full Stack Developer at TechFlow Solutions in San Francisco, CA.


## Making RAG Agentic

With a RAG pipeline in hand, let's turn it into a tool that can be used by an agent to answer questions. This is a stepping-stone towards creating an agentic system that can perform your larger goal.

In [50]:
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import FunctionCallingAgent

First, create a regular python function that performs a RAG query. It's important to give this function a descriptive name, to mark its input and output types, and to include a docstring (that's the thing in triple quotes) which describes what it does. The framework will give all this metadata to the LLM, which will use it to decide what a tool does and whether to use it.

In [51]:
def query_resume(q: str) -> str:
    """Answers questions about a specific resume."""
    # we're using the query engine we already created above
    response = query_engine.query(f"This is a question about the specific resume we have in our database: {q}")
    return response.response

The next step is to create the actual tool. There's a utility function, `FunctionTool.from_defaults`, to do this for you.

In [52]:
resume_tool = FunctionTool.from_defaults(fn=query_resume)

Now you can instantiate a `FunctionCallingAgent` using that tool. There are a number of different agent types supported by LlamaIndex; this one is particularly capable and efficient.

You pass it an array of tools (just one in this case), you give it the same LLM we instantiated earlier, and you set Verbose to true so you get a little more info on what your agent is up to.

In [53]:
agent = FunctionCallingAgent.from_tools(
    tools=[resume_tool],
    llm=llm,
    verbose=True
)

Now you can chat to the agent! Let's ask it a quick question about our applicant.

In [54]:
response = agent.chat("How many years of experience does the applicant have?")
print(response)

> Running step fbb18c4a-f194-41b7-ad1c-a025e1f74533. Step input: How many years of experience does the applicant have?
Added user message to memory: How many years of experience does the applicant have?
=== Calling Function ===
Calling function: query_resume with args: {"q": "How many years of experience does the applicant have?"}
=== Function Output ===
The applicant has over 6 years of experience as a Full Stack Web Developer. They worked as a Junior Web Developer from June 2017 to February 2019, as a Full Stack Developer from March 2019 to December 2021, and as a Senior Full Stack Developer from January 2022 to the present.
> Running step bf9d2230-b88a-4e41-977f-f41adbba501f. Step input: None
=== LLM Response ===
The applicant has over 6 years of experience as a Full Stack Web Developer.
The applicant has over 6 years of experience as a Full Stack Web Developer.


You can see the agent getting the question, adding it to its memory, picking a tool, calling it with appropriate arguments, and getting the output back.

## Wrapping the Agentic RAG into a Workflow

You've now got a RAG pipeline and an agent. Let's now create a similar agentic RAG from scratch using a workflow, which you'll extend in later lessons. You won't rely on any of the things you've already created.

Here's the workflow you will create:
<img width="400" src="images/rag_workflow.png">

It consists of two steps:
1. `set_up` which is triggered by `StartEvent` and emits `QueryEvent`: at this step, the RAG system is set up and the query is passed to the second step;
2. `ask_question` which is triggered by `QueryEvent` and emits `StopEvent`: here the response to the query is generated using the RAG query engine.

In [60]:
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import VectorStoreIndex

embedding_model = AzureOpenAIEmbedding(model_name="text-embedding-ada-002")
llm = AzureOpenAI(engine="gpt-4", model="gpt-4")

In [61]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
    Context
)

In [62]:
class QueryEvent(Event):
    query: str

In [68]:
class RAGWorkflow(Workflow):
    storage_dir = "./storage"
    llm: AzureOpenAI
    query_engine: VectorStoreIndex

    # the first step will be setup
    @step
    async def set_up(self, ctx: Context, ev: StartEvent) -> QueryEvent:

        if not ev.resume_file:
            raise ValueError("No resume file provided")

        # define an LLM to work with
        self.llm = llm

        # ingest the data and set up the query engine
        if os.path.exists(self.storage_dir):
            # you've already ingested your documents
            storage_context = StorageContext.from_defaults(persist_dir=self.storage_dir)
            index = load_index_from_storage(storage_context)
        else:
            # parse and load your documents
            documents = LlamaParse(
                result_type="markdown",
                content_guideline_instruction="This is a resume, gather related facts together and format it as bullet points with headers"
            ).load_data(ev.resume_file)
            # embed and index the documents
            index = VectorStoreIndex.from_documents(
                documents,
                embed_model=embedding_model
            )
            index.storage_context.persist(persist_dir=self.storage_dir)

        # either way, create a query engine
        self.query_engine = index.as_query_engine(llm=self.llm, embed_model=embedding_model, similarity_top_k=5)

        # now fire off a query event to trigger the next step
        return QueryEvent(query=ev.query)

    # the second step will be to ask a question and return a result immediately
    @step
    async def ask_question(self, ctx: Context, ev: QueryEvent) -> StopEvent:
        response = self.query_engine.query(f"This is a question about the specific resume we have in our database: {ev.query}")
        return StopEvent(result=response.response)

You run it like before, giving it a fake resume we created for you.

In [69]:
w = RAGWorkflow(timeout=120, verbose=False)
result = await w.run(
    resume_file="./data/fake_resume.pdf",
    query="Where is the first place the applicant worked?"
)
print(result)

The first place the applicant worked, according to the resume, was at StartupHub in San Jose, CA.


Note: there's a small bug here: if we run this a second time, with a new resume, this code will find the old resume and not bother to parse it. 

## Workflow Visualization

You can visualize the workflow you just created.

In [70]:
WORKFLOW_FILE = "workflows/rag_workflow.html"
draw_all_possible_flows(w, filename=WORKFLOW_FILE)
html_content = extract_html_content(WORKFLOW_FILE)
display(HTML(html_content), metadata=dict(isolated=True))

<class 'NoneType'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.QueryEvent'>
workflows/rag_workflow.html
