# 🛠️ Implementation of RAG using LlamaIndex

**This notebook provides an implementation of a simple RAG system using LlamaIndex and comparing results with 2 small LLMs: TinyLlama1.1B and Zephyr-7b-gemma-v0.1<br>It also uses FAISS as a vector database and sentence-transformers as en embedding model.**


## 🧰 Install required dependencies
* LlamaIndex
* Langchain to use a custom embedding model with LlamaIndex
* Faiss as a vector storage
* sentence-transformers as an embedding model
* torch to run the LLM using a GPU

In [None]:
! pip install llama-index==0.10.18
! pip install langchain==0.1.11
! pip install faiss-gpu
! pip install sentence-transformers
! pip install torch==2.2.1
! pip install accelerate
! pip install pypdf
! pip install llama-index-vector-stores-faiss
! pip install llama-index-embeddings-langchain
! pip install llama-index-embeddings-huggingface
! pip install llama-index-llms-huggingface

## 📥 Import modules

In [8]:
#Core LlamaIndex
from llama_index.core.node_parser import SentenceSplitter

from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    VectorStoreIndex,
    StorageContext,
    Settings
)

#Vector storage
import faiss
from llama_index.vector_stores.faiss import FaissVectorStore

#Embedding model
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

#LLM
import torch
from llama_index.llms.huggingface import HuggingFaceLLM

#Other useful modules
from pathlib import Path

## 🔄 Load the document


### Document is retrieved from the Internet. Change URL if you want to use another document

In [None]:
!wget -O quantum.pdf  https://www.dst.defence.gov.au/sites/default/files/events/documents/Quantum%20Computing%20Insights%20Paper.pdf

In [None]:
reader = SimpleDirectoryReader(
    input_files=["quantum.pdf"]
)
documents = reader.load_data()

print('Number of pages:', len(documents))
print(documents)

## 🧩 Create nodes

In LlamaIndex, documents are transformed into nodes, which are smaller data units (i.e. chunks).
We use a node parser that tries to keep the sentences and paragraphs together. The max chunk size is set to 512 tokens with an overlap of 20 tokens if the paragraphs has to be splitted.


In [11]:
# Initialize the parser
parser = SentenceSplitter.from_defaults(chunk_size=512, chunk_overlap=20)

import pprint
# Parse documents into nodes
nodes = parser.get_nodes_from_documents(documents)
print(f"Number of nodes created: {len(nodes)}")
pprint.pprint([nodes[i] for i in range(3)])

Number of nodes created: 102
[TextNode(id_='1350f00d-c1a6-44ce-ab06-4c72e7dc5d1c', embedding=None, metadata={'page_label': '1', 'file_name': 'quantum.pdf', 'file_path': 'quantum.pdf', 'file_type': 'application/pdf', 'file_size': 1962431, 'creation_date': '2024-03-12', 'last_modified_date': '2022-07-25'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='5c69ea8f-9699-472d-83e2-4737658da046', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '1', 'file_name': 'quantum.pdf', 'file_path': 'quantum.pdf', 'file_type': 'application/pdf', 'file_size': 1962431, 'creation_date': '2024-03-12', 'last_modified_date': '2022-07-25'}, hash='d9b7731ae566d47707c6cd1f6bf9c74049f15203b9228c62ffeeac73457ab24f'), <Nod

## 🛢 Store the embeddings of the nodes into a vector store (Faiss)

In order to query the nodes, we extract their embeddings and store them in a vector store. We will use Faiss here but many other options are available.

Regarding the embedding model, we will use sentence-transformers as it is a performant open-source model.

In [None]:
#Create a Faiss index. 768 is the dimensionality of the embeddings generated by sentence-transformers
faiss_index = faiss.IndexFlatL2(768)

#Load the embedding model
Settings.embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

#Define ServiceContext which is needed to create the index. As we have not instantiated the LLM so far, we set it as None
#service_context = ServiceContext.from_defaults(llm=None, embed_model=embed_model)

#Create a vector storage and its context
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

#Add the embeddings to the index
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context#, service_context=service_context
)

# save index to disk. Will be stored in ./storage by default
index.storage_context.persist()


## 🧠 Instantiate the LLM

Choose one of these two.

##  If you want to test TinyLlama1.1B

In [None]:
Settings.llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.1, "do_sample": False},
    tokenizer_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    tokenizer_kwargs={"max_length": 2048},
    model_kwargs={"torch_dtype": torch.float16}
)

## To use Zephyr-7b-gemma-v0,1

In [None]:
Settings.llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.1, "do_sample": False},
    tokenizer_name="HuggingFaceH4/zephyr-7b-beta",
    model_name="HuggingFaceH4/zephyr-7b-beta",
    tokenizer_kwargs={"max_length": 2048},
    model_kwargs={"torch_dtype": torch.float16}
)

## 🤔 Query the document

We retrieve the index (not really needed here but could be useful if querying is done later than creating the index).

Then, we define a query engine with this index and ask our question.

In [16]:
stored_index = load_index_from_storage(storage_context)

query_engine = stored_index.as_query_engine()
prompt="What is quantum computing?"
#prompt="What will quantum computing be capable of in 2040?"
#prompt="What are the ethical challenges related to quantum computing?"
import time
t0=time.time()
response = query_engine.query(prompt)
print(f"Time: {time.time()-t0}")



Time: 1.3071348667144775


## 🔎 Display response and sources in a table

In [17]:
import pandas as pd
from IPython.display import display, HTML


pd.set_option("display.max_colwidth", -1)


def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n", "<br>")))


def visualize_retrieved_nodes(nodes) -> None:
    result_dicts = []
    for node in nodes:
        result_dict = {"Score": node.score, "Text": node.node.get_text()}
        result_dicts.append(result_dict)

    pretty_print(pd.DataFrame(result_dicts))


print(response.response)

nodes= response.source_nodes
visualize_retrieved_nodes(nodes)

Quantum computing is a general model for modelling information and how information sciences work. It is considered more accurate and more tied to the physical world than what has been used before.


  pd.set_option("display.max_colwidth", -1)


Unnamed: 0,Score,Text
0,0.612567,"16 EMERGING DISRUPTIVE TECHNOLOGY ASSESSMENT SYMPOSIUMIn broad terms the layers indicated in Figure 1 are: Algorithms and Applications. In this layer, applications are mapped into a set of quantum computer algorithms which together solve a specified problem. However, it needs to be acknowledged that quantum computers are co-processors, and that the vast majority of applications will require orchestration of quantum and classical computers working together. Software, Compilation, and Control. This layer includes a quantum programming language, which the algorithms required are translated into:  +A compiler that maps these programs to logical qubit operations  +A scheduler and optimiser for the logical qubit operations +Error correction firmware within which logical qubits are mapped to underlying physical qubits. Hardware. This layer includes physical level schedulers, optimisers, and device control firmware. Qubits. In this layer specific pulses are produced which control each qubit. In addition this layer includes readout of qubit values. These values are fed back up to higher layers to implement fault tolerant error correction and calculation output. Networking and Integration. Communication and quantum state transfer between quantum computers for distributed computation and communication, including integration with hybrid systems and networks. Further explanatory information regarding quantum computing and ‘the stack’ is contained in Annex B."
1,0.67968,"22 EMERGING DISRUPTIVE TECHNOLOGY ASSESSMENT SYMPOSIUMDespite being at a relatively early stage of development, some argue that quantum computing has already changed computer science. For example, quantum computing has stimulated debates in computing and information sciences as to whether Alan Turing’s (computational) model of what a computer is – which is what we have in the form of classical computers – is the most complete model of what computing could be or should be. With the advent of the quantum computing model it is now apparent that this is not the case, especially if you take efficiencies into account. Quantum computing is a more general model for modelling information and how information sciences work. It is considered more accurate and more tied to the physical world than what has been used before. As such, quantum computing has contested previous thinking regarding information and computing theory. Many envisage quantum computing as a means to augment classical computers, not to replace them. Indeed, there will be a symbiosis. Development in quantum computing will improve classical computers e.g. classical computer programmers have derived better algorithms for classical computers from quantum algorithms. As such, classical computers will become more powerful alongside quantum computers. In some ways quantum computers are analogous to the turbo in a motor car, where it isn’t needed all the time, but operates when it is required to enhance performance. Stand-alone quantum computers will also require classical computers to operate them. In the coming years impacts from quantum computing may be seen in investment portfolio optimisation from financial institutions and other entities seeking high risk, but high pay off ventures. Similarly, impacts may be seen in encryption for secure transactions, which can potentially be offset by the upgrading to post quantum cryptography, but as quantum computing hardware emerges this will need to be continually monitored. Changes could be seen in the landscape of industry, from advanced manufacturing through to increased simulation capability, the study of new molecules within health sciences, improvements in robotics and autonomous systems, or new, currently unknown industries could emerge."
