# Basic RAG using Llama-Index

Tutorial source used: 
- https://medium.com/@kofsitho/basic-tutorial-rag-with-llama-index-8927a5716dd1
- https://docs.llamaindex.ai/en/stable/understanding/loading/loading/

## Loading Document

In [3]:
#%pip install nbconvert

In [9]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()

## Loaded Document to VectorStore (Text to Vector)

In [2]:
#%pip install llama-index-embeddings-cohere cohere

In [16]:
# Load VectorStoreIndex by selecting the splitter(chunk_size, chunk_overlap) and embedded model directly
import os
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.core import VectorStoreIndex

embed_model = CohereEmbedding(
    api_key=os.environ["COHERE_API_KEY"],
    model_name="embed-english-v3.0"
)

node_parser = SentenceSplitter(chunk_size=256, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes, embed_model=embed_model, show_progress=True)

Generating embeddings: 100%|██████████| 103/103 [00:06<00:00, 17.14it/s]


## Default Retriever

In [25]:
base_retriever = index.as_retriever(similarity_top_k=5)

source_nodes = base_retriever.retrieve("What is the RAG?")

In [26]:
# check source_nodes

for node in source_nodes:
    # print(node.metadata)
    print(f"---------------------------------------------")
    print(f"Score: {node.score:.3f}")
    print(node.get_content())
    print(f"---------------------------------------------\n\n")

---------------------------------------------
Score: 0.498
RAG-Token The RAG-Token model can be seen as a standard, autoregressive seq2seq genera-
tor with transition probability: p′
θ(yi|x,y1:i−1) = ∑
z∈top-k(p(·|x)) pη(zi|x)pθ(yi|x,zi,y1:i−1) To
decode, we can plug p′
θ(yi|x,y1:i−1) into a standard beam decoder.
RAG-Sequence For RAG-Sequence, the likelihood p(y|x) does not break into a conventional per-
token likelihood, hence we cannot solve it with a single beam search. Instead, we run beam search for
each document z, scoring each hypothesis using pθ(yi|x,z,y 1:i−1). This yields a set of hypotheses
Y, some of which may not have appeared in the beams of all documents.
---------------------------------------------


---------------------------------------------
Score: 0.458
Since RAG can be
employed as a language model, similar concerns as for GPT-2 [50] are valid here, although arguably
to a lesser extent, including that it might be used to generate abuse, faked or misleading conten

## Query Engine

In [1]:
#%pip install llama-index-llms-cohere cohere

In [21]:
from llama_index.llms.cohere import Cohere

llm = Cohere(
    model="command-r",  # or command-r-plus if you have access
    temperature=0,
    api_key=os.environ["COHERE_API_KEY"],
    max_tokens=512,
)

query_engine = index.as_query_engine(
  streaming=True, 
  similarity_top_k=2, 
  llm=llm
)

## Response generated from query

In [24]:
response = query_engine.query("What is the RAG Sequence model?")
response.print_response_stream()

The RAG Sequence model uses a single retrieved document to generate the complete sequence of target tokens. The retrieved document is treated as a latent variable which is marginalised to obtain the seq2seq probability, using a top-K approximation.