## Adding RAG to workflow

Add a document database to a workflow

we’ll parse a resume and load it into a vector store, and use the agent to run basic queries against the documents. You’ll use LlamaParse to parse the documents.

## Importing libraries

In [7]:
from IPython.display import display, HTML
from llama_index.utils.workflow import draw_all_possible_flows
import os
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Context
)

from llama_index.core.workflow import Event
import asyncio


## API for workflow

In [8]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [9]:
import os

def get_openai_api_key():
    """Retrieve the OpenAI API key from environment variables."""
    return os.getenv('OPEN_AI_KEY')

def get_llama_cloud_api_key():
    """Retrieve the Llama Cloud API key from environment variables."""
    return os.getenv('LLAMA_CLOUD_API')

def extract_html_content(filename):
    """Read an HTML file and wrap its content in a scrollable div."""
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            html_content = file.read()
            html_content = f""" <div style="width: 100%; height: 800px; overflow: hidden;"> {html_content} </div>"""
            return html_content
    except Exception as e:
        raise Exception(f"Error reading file: {str(e)}")


You need nested async for this to work, so let's enable it here. It allows you to nest asyncio event loops within each other. 

*Note:* In asynchronous programming, the event loop is like a continuous cycle that manages the execution of code.

In [10]:
import nest_asyncio
nest_asyncio.apply()

You also need two API keys: 
- LLM like you we earlier;
- LlamaCloud API key to use LlamaParse to parse the PDFs.

LlamaParse is an advanced document parser that can read PDFs, Word files, Powerpoints, Excel spreadsheets, and extract information out of complicated PDFs into a form LLMs find easy to understand.

In [11]:
llama_cloud_api_key = get_llama_cloud_api_key()
llm_api_key = os.getenv('GROQ_API')

## Performing Retrieval-Augmented Generation (RAG) on a Resume Document

### 1. Parsing the Resume Document 

Let's start by parsing a resume.

![image.png](attachment:image.png)

Using LLamaParse, we will transform the resume into a list of Document objects. By default, a Document object stores text along with some other attributes:
- metadata: a dictionary of annotations that can be appended to the text.
- relationships: a dictionary containing relationships to other Documents.
  

You can tell LlamaParse what kind of document it's parsing, so that it will parse the contents more intelligently. In this case, you tell it that it's reading a resume.

In [12]:
from llama_parse import LlamaParse

In [13]:
documents = LlamaParse(
    api_key=llama_cloud_api_key,
    result_type="markdown",
    content_guideline_instruction="This is a resume, gather related facts together and format it as bullet points with headers"
).load_data(
    "data/fake_resume.pdf",
)

Started parsing the file under job_id 92241e84-9652-43dc-b0a2-c63a1ed90663


In [14]:
print(len(documents))

3


In [15]:
print(documents[0])

Doc ID: 62102dbf-f663-44ef-8013-7f3f8c712ca2
Text: # Sarah Chen  Email: sarah.chen@email.com  LinkedIn:
linkedin.com/in/sarahchen  GitHub: github.com/sarahcodes  Portfolio:
sarahchen.dev  Location: San Francisco, CA  # Professional Summary
Innovative Full Stack Web Developer with 6+ years of experience
crafting scalable web applications and microservices. Specialized in
React, Node.js, and clou...


In [19]:
for i in documents[0]:
    print(i)
    
for i in documents[1]:
    print(i)
    
for i in documents[2]:
    print(i)

('id_', '62102dbf-f663-44ef-8013-7f3f8c712ca2')
('embedding', None)
('metadata', {})
('excluded_embed_metadata_keys', [])
('excluded_llm_metadata_keys', [])
('relationships', {})
('metadata_template', '{key}: {value}')
('metadata_separator', '\n')
('text_resource', MediaResource(embeddings=None, data=None, text="# Sarah Chen\n\nEmail: sarah.chen@email.com\n\nLinkedIn: linkedin.com/in/sarahchen\n\nGitHub: github.com/sarahcodes\n\nPortfolio: sarahchen.dev\n\nLocation: San Francisco, CA\n\n# Professional Summary\n\nInnovative Full Stack Web Developer with 6+ years of experience crafting scalable web applications and microservices. Specialized in React, Node.js, and cloud architecture. Proven track record of leading technical teams and implementing CI/CD pipelines that reduced deployment time by 40%. Passionate about clean code, accessibility, and mentoring junior developers.\n\n# Professional Experience\n\n# Senior Full Stack Developer\n\nTechFlow Solutions | San Francisco, CA January 202

In [21]:
print(documents[0].text)
print(documents[1].text)
print(documents[2].text)

# Sarah Chen

Email: sarah.chen@email.com

LinkedIn: linkedin.com/in/sarahchen

GitHub: github.com/sarahcodes

Portfolio: sarahchen.dev

Location: San Francisco, CA

# Professional Summary

Innovative Full Stack Web Developer with 6+ years of experience crafting scalable web applications and microservices. Specialized in React, Node.js, and cloud architecture. Proven track record of leading technical teams and implementing CI/CD pipelines that reduced deployment time by 40%. Passionate about clean code, accessibility, and mentoring junior developers.

# Professional Experience

# Senior Full Stack Developer

TechFlow Solutions | San Francisco, CA January 2022 - Present

- Architected and implemented a microservices-based e-commerce platform serving 100K+ daily users
- Led a team of 5 developers in rebuilding the company's flagship product using React and Node.js
- Implemented GraphQL API gateway that reduced API response times by 60%
- Established coding standards and review processes 

## Using embedding model

In [4]:
# pip install llama-index llama-index-embeddings-huggingface sentence-transformers

In [5]:
pip install --upgrade llama-index sentence-transformers datasets

Defaulting to user installation because normal site-packages is not writeable
Collecting llama-index
  Obtaining dependency information for llama-index from https://files.pythonhosted.org/packages/b7/59/38019698f605d8f72f0e2a7b6ffdc9be536b8ecb54d052fea23363243ea2/llama_index-0.12.24-py3-none-any.whl.metadata
  Downloading llama_index-0.12.24-py3-none-any.whl.metadata (12 kB)
Collecting datasets
  Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/7b/37/aaeec6b97c91833f1342c755beb0283f20b6ee208522af04daf49c251bdb/datasets-3.4.0-py3-none-any.whl.metadata
  Downloading datasets-3.4.0-py3-none-any.whl.metadata (19 kB)
Collecting llama-index-core<0.13.0,>=0.12.24 (from llama-index)
  Obtaining dependency information for llama-index-core<0.13.0,>=0.12.24 from https://files.pythonhosted.org/packages/fc/cb/014da862f7f53ad2c0b388426de06099526d7736613166c9af113d41d19f/llama_index_core-0.12.24.post1-py3-none-any.whl.metadata
  Downloading llama_index_core-0



In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### 2. Creating a Vector Store Index

![image.png](attachment:image.png)

We'll now feed the Document objects to `VectorStoreIndex`. The `VectorStoreIndex` will use an embedding model to embed the text, i.e. turn it into vectors that you can search. You'll be using an embedding model provided by Hugging face.

The `VectorStoreIndex` will return an index object, which is a data structure that allows you to quickly retrieve relevant context for your query. It's the core foundation for RAG use-cases. You can use indexes to build Query Engines and Chat Engines which enables question & answer and chat over your data.

In [4]:
from llama_index.core import VectorStoreIndex

In [15]:
index = VectorStoreIndex.from_documents(
    documents
)

# we have already setup the embed model with setting so no need write the embedding model here

## Setting up LLM

In [16]:
from llama_index.llms.groq import Groq
from llama_index.core import Settings


os.environ["GROQ_API_KEY"] = os.getenv('GROQ_API')
llm = Groq(model="llama3-70b-8192", api_key=os.environ["GROQ_API_KEY"])
Settings.llm = llm

In [17]:
query_engine = index.as_query_engine(llm=llm, similarity_top_k=5)
response = query_engine.query("What is this person's name and what was their most recent job?")
print(response)

This person's name is Sarah Chen, and their most recent job was as a Senior Full Stack Developer at TechFlow Solutions.


## Storing the Index to Disk

Indexes can be persisted to disk. This is useful in a notebook that you might run several times! In a production setting, you would probably use a hosted vector store of some kind. Let's save your index to disk.

In [18]:
storage_dir = "./storage"

index.storage_context.persist(persist_dir=storage_dir)

In [19]:
from llama_index.core import StorageContext, load_index_from_storage

You can check if your index has already been stored, and if it has, you can reload an index from disk using the `load_index_from_storage` method, like this:

In [20]:
# Check if the index is stored on disk
if os.path.exists(storage_dir):
    # Load the index from disk
    storage_context = StorageContext.from_defaults(persist_dir=storage_dir)
    restored_index = load_index_from_storage(storage_context)
else:
    print("Index not found on disk.")

In [21]:
response = restored_index.as_query_engine().query("What is this person's name and what was their most recent job?")
print(response)

Retrying llama_index.llms.openai.base.OpenAI._chat in 0.9794322031823643 seconds as it raised APIConnectionError: Connection error..
Retrying llama_index.llms.openai.base.OpenAI._chat in 1.2285381864928593 seconds as it raised APIConnectionError: Connection error..


APIConnectionError: Connection error.