# RAG with LlamaIndex

In this notebook we will look into building an RAG pipeline with LlamaIndex. It has following 3 sections.

1.   Understanding Retrieval Augmented Generation (RAG).
2.   Building RAG with LlamaIndex.
3.   Storing the Index In Disk.

In RAG, your data is loaded and and prepared for queries or “indexed”. User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

# Business Statement

An ecommerce chatbot simulates the in-store human assistant and tries to replicate the experience online. You can use it in your ecommerce store to provide real-time customer service, improve the experience, market your products, and boost your sales. 
These bots can:
 - Increase conversion rates
 - Generate more leads
 - Boost sales
 - Provide 24/7 instant support
 
In fact, research shows that chatbots increase the conversion rate by as much as 67%. What’s more—almost 33% of shoppers find long waiting times the most frustrating when it comes to a customer service experience. This shows that by using the instant messaging software you can offer quick assistance to shoppers and simultaneously increase your revenue. 
Here’s how Emelie uses this creative chatbot idea

# Nest Asyncio

We'll set up our nest_asyncio so we can leverage async loops in our Notebook

In [3]:
import nest_asyncio

nest_asyncio.apply()

## Provide OpenAI API Key

In [None]:
import os

os.environ['OPENAI_API_KEY']

Load Data and Build Index.

In [5]:
from llama_index import VectorStoreIndex,SimpleDirectoryReader
documents=SimpleDirectoryReader("data").load_data()

Define LLM

In [6]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

let's build a simple RAG pipeline

In [7]:
from llama_index.node_parser import SimpleNodeParser
from llama_index.evaluation import generate_question_context_pairs
from llama_index.evaluation import RetrieverEvaluator


# Build index with a chunk_size of 512
node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
nodes = node_parser.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)

In [8]:
from IPython.display import Markdown, display

Build a QueryEngine and start querying.

In [9]:
query_engine = vector_index.as_query_engine()

In [10]:
response_vector = query_engine.query("What services you offer?")

Check response.

In [11]:
response_vector.response

'We offer a variety of beauty services in our stores. The menu of services may vary by location. To see what services are available in your local store, please visit our website at www.emelie.com/happening.'

In [12]:
response_vector = query_engine.query("What do I do when I arrive for my service??")

In [13]:
response_vector.response

'When you arrive for your service, you should head to the Beauty or Skincare Studio area to check in or ask any Beauty Advisor. Prior to beginning your service, the Beauty Advisor will take you through a pre-screening safety questionnaire. You will also be asked to review and sign a digital waiver, with a legal guardian completing the waiver on behalf of a minor.'

In [13]:
response_vector = query_engine.query("Do all stores offer services?")

In [14]:
response_vector.response

"Beauty services are available in most stores in accordance with state and local ordinances. However, beauty services are not available at Emelie at Kohl's locations. To check the availability of beauty services in your store, please visit emelie.com."

# Storing your index

By default, the data you just loaded is stored in memory as a series of vector embeddings. You can save time (and requests to OpenAI) by saving the embeddings to disk. That can be done with this line:

In [16]:
vector_index.storage_context.persist()

By default, this will save the data to the directory storage, but you can change that by passing a persist_dir parameter

In [17]:
import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("How do i book my appointment?")
print(response)

The best way to book an appointment is by visiting Emelie.com/happening or within the Emelie app (by selecting the Stores tab). Additionally, you may call or visit the store to schedule. Our online booking opens 30 days in advance. If you need an appointment outside of that window, please contact your store directly to inquire.


# Conclusion

In this notebook, we have explored how to build and evaluate a RAG pipeline using LlamaIndex retrieval system and generated responses within the pipeline and saving the embeddings to disk