# RAG Bootcamp ◦ February 2024 ◦ Vector Institute 

In [1]:
##################################################################
# Venue: RAG Bootcamp - Vector Institute Canada
# Talk: RAG Bootcamp: Intro to RAG with the LlamaIndexFramework
# Speaker: Andrei Fajardo
##################################################################

![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/title.excalidraw.svg)

![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/framework.excalidraw.svg)

#### Notebook Setup & Dependency Installation

In [2]:
%pip install llama-index llama-index-vector-stores-qdrant -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
import nest_asyncio

nest_asyncio.apply()

In [4]:
!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"

--2024-02-28 13:05:08--  https://arxiv.org/pdf/2402.09353.pdf
Resolving arxiv.org (arxiv.org)... 151.101.3.42, 151.101.131.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.3.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 604050 (590K) [application/pdf]
Saving to: ‘./data/dorav1.pdf’


2024-02-28 13:05:08 (5.23 MB/s) - ‘./data/dorav1.pdf’ saved [604050/604050]



## Motivation

![Motivation Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/motivation.excalidraw.svg)


In [5]:
# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")

In [6]:
print(response.text)

Without specific context, it's hard to determine which "DoRA" you're referring to as it could mean different things in different fields. However, in the context of education, DoRA often stands for "Division of Research Administration" which is responsible for the oversight of research activities in an institution. Please provide more context.


## Basic RAG in 3 Steps

![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/subheading.excalidraw.svg)


1. Build external knowledge (i.e., uploading updated data sources)
2. Retrieve
3. Augment and Generate

## 1. Build External Knowledge

![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step1.excalidraw.svg)

In [7]:
"""Load the data.

With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""

from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()

In [8]:
# if you want to see what the text looks like
documents[0].text

'DoRA: Weight-Decomposed Low-Rank Adaptation\nShih-Yang Liu* 1 2Chien-Yi Wang1Hongxu Yin1Pavlo Molchanov1Yu-Chiang Frank Wang1\nKwang-Ting Cheng2Min-Hung Chen1\nAbstract\nAmong the widely used parameter-efficient fine-\ntuning (PEFT) methods, LoRA and its variants\nhave gained considerable popularity because of\navoiding additional inference costs. However,\nthere still often exists an accuracy gap between\nthese methods and full fine-tuning (FT). In this\nwork, we first introduce a novel weight decom-\nposition analysis to investigate the inherent dif-\nferences between FT and LoRA. Aiming to re-\nsemble the learning capacity of FT from the\nfindings, we propose Weight- Decomposed L ow-\nRankAdaptation ( DoRA ). DoRA decomposes\nthe pre-trained weight into two components, mag-\nnitude anddirection , for fine-tuning, specifically\nemploying LoRA for directional updates to effi-\nciently minimize the number of trainable param-\neters. By employing DoRA, we enhance both\nthe learning cap

In [9]:
"""Chunk, Encode, and Store into a Vector Store.

To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")

pipeline = IngestionPipeline(
    transformations = [
        SentenceSplitter(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)

In [12]:
# if you want to see the nodes
# len(_nodes)
# _nodes[0].text

In [13]:
"""Create a llama-index... wait for it... Index.

After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## 2. Retrieve Against A Query

![Step2 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step2.excalidraw.svg)

In [14]:
"""Retrieve relevant documents against a query.

With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""

retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")

In [17]:
# to view the retrieved node
# print(retrieved_nodes[0].text)

## 3. Generate Final Response

![Step3 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step3.excalidraw.svg)

In [18]:
"""Context-Augemented Generation.

With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""

query_engine = index.as_query_engine()

In [19]:
# to inspect the default prompt being used
print(query_engine.
      get_prompts()["response_synthesizer:text_qa_template"]
      .default_template.template
)

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


In [20]:
response = query_engine.query("What is DoRA?")
print(response)

DoRA is a method that enhances the learning capacity of LoRA by introducing incremental directional updates that can be adapted by different LoRA variants. It can replace the concept of incremental directional updates with alternative LoRA variants, such as VeRA, which suggests freezing a unique pair of random low-rank matrices shared across all layers and using minimal layer-specific trainable scaling vectors to capture each layer's incremental updates. This approach significantly reduces the number of trainable parameters while maintaining accuracy. DoRA has been shown to consistently outperform LoRA across various rank settings for commonsense reasoning tasks, indicating its effectiveness in improving accuracy with fewer trainable parameters.


## In Summary

- LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain specific, updated data, long-tail)
- Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation
- In this notebook, we showed one such example that follows that pattern.

# LlamaIndex Has More To Offer

- Data infrastructure that enables production-grade, advanced RAG systems
- Agentic solutions
- Newly released: `llama-index-networks`
- Enterprise offerings (alpha):
    - LlamaParse (proprietary complex PDF parser) and
    - LlamaCloud

### Useful links

[website](https://www.llamaindex.ai/) ◦ [llamahub](https://llamahub.ai) ◦ [github](https://github.com/run-llama/llama_index) ◦ [medium](https://medium.com/@llama_index) ◦ [rag-bootcamp-poster](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/final_poster.excalidraw.svg)