<a href="https://colab.research.google.com/github/rahiakela/transformers-research-and-practice/blob/main/llamindex-projects/01_rag_with_llamaindex_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##RAG System using Llama-index with OpenAI

In [None]:
## Retrieval augmented generation

import os
import os.path
from dotenv import load_dotenv
load_dotenv()

In [None]:
os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")

In [None]:
from llama_index import VectorStoreIndex,SimpleDirectoryReader
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.postprocessor import SimilarityPostprocessor
from llama_index.response.pprint_utils import pprint_response

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

##Load documents

In [None]:
documents=SimpleDirectoryReader("data").load_data()

In [None]:
documents

##Vector Index

In [None]:
index=VectorStoreIndex.from_documents(documents,show_progress=True)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 25/25 [00:00<00:00, 247.44it/s]
Generating embeddings: 100%|██████████| 36/36 [00:03<00:00, 11.11it/s]


In [None]:
index

<llama_index.indices.vector_store.base.VectorStoreIndex at 0x21c577edf90>

##Query vector index

In [None]:
query_engine=index.as_query_engine()

In [None]:
retriever=VectorIndexRetriever(index=index,similarity_top_k=4)
postprocessor=SimilarityPostprocessor(similarity_cutoff=0.80)

query_engine=RetrieverQueryEngine(retriever=retriever, node_postprocessors=[postprocessor])

In [None]:
response=query_engine.query("What is attention is all yopu need?")

In [None]:
pprint_response(response,show_source=True)
print(response)

Final Response: The paper "Attention Is All You Need" proposes a new
network architecture called the Transformer. This architecture is
based solely on attention mechanisms and does not use recurrent or
convolutional neural networks. The paper demonstrates that the
Transformer models outperform existing models in terms of quality,
parallelizability, and training time. The Transformer achieves state-
of-the-art results in machine translation tasks and generalizes well
to other tasks such as English constituency parsing.
______________________________________________________________________
Source Node 1/1
Node ID: be144ab8-cb0a-44fa-af69-3dbfe555e41a
Similarity: 0.8107415810551661
Text: Provided proper attribution is provided, Google hereby grants
permission to reproduce the tables and figures in this paper solely
for use in journalistic or scholarly works. Attention Is All You Need
Ashish Vaswani∗ Google Brain avaswani@google.comNoam Shazeer∗ Google
Brain noam@google.comNiki Parmar∗ Goo

##Save vector index

In [None]:
# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What are transformers?")
print(response)

Transformers are a model architecture that rely entirely on an attention mechanism to draw global dependencies between input and output. They eschew recurrence and do not use sequence-aligned RNNs or convolution. Transformers allow for significantly more parallelization and have been shown to achieve state-of-the-art results in tasks such as translation. They can be trained faster than architectures based on recurrent or convolutional layers.
