Retrieval Augmented Generation

In [2]:
# loading all environment variables ready for use
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
os.environ['OPENAI_API_KEY']=os.getenv('OPENAI_API_KEY')

In [4]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()


In [5]:
documents

[Document(id_='3ecabc92-8b6b-4645-8d31-bba4bdec36d4', embedding=None, metadata={'page_label': '1', 'file_name': 'user-guide.pdf', 'file_path': 'data\\user-guide.pdf', 'file_type': 'application/pdf', 'file_size': 211916, 'creation_date': '2024-02-09', 'last_modified_date': '2024-02-09', 'last_accessed_date': '2024-02-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='  banking on  people  \n \n \n \n \n \nDear Valued Customer,  \n \nPlease  read the instructions below before completing the KYC, FATCA & CRS forms in respective to \nyour account type.  \n \nPersonal/ Individual Account:  \n \n\uf0b7 The account holder (s) is / are required to complete and sign the Individual KYC  form in line \nwith his/her information. In the case of joint account relat

In [7]:
index=VectorStoreIndex.from_documents(documents,show_progress=True)

Parsing nodes:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
index

<llama_index.indices.vector_store.base.VectorStoreIndex at 0x2129db9ac50>

In [9]:
query_engine = index.as_query_engine()

In [10]:
query_engine #it is a query engine to retrieve the documents to a given query

<llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x212a8bd9150>

In [15]:
response = query_engine.query('what is Individual KYC form?')

In [16]:
print(response)

The Individual KYC form is a form that needs to be completed and signed by the account holder(s) for personal/individual accounts. It collects and verifies the account holder's information, such as their Emirate ID card copy, passport copy, and updated income proof. It also includes the account holder's self-certification/declaration for FATCA and CRS compliance. In the case of a joint account, each account holder is required to fill and sign a separate form.


In [19]:
# display response much better way

from llama_index.response.pprint_utils import pprint_response
pprint_response(response,show_source=True)


print(response)

Final Response: The Individual KYC form is a form that needs to be
completed and signed by the account holder(s) for personal/individual
accounts. It collects and verifies the account holder's information,
such as their Emirate ID card copy, passport copy, and updated income
proof. It also includes the account holder's self-
certification/declaration for FATCA and CRS compliance. In the case of
a joint account, each account holder is required to fill and sign a
separate form.
______________________________________________________________________
Source Node 1/2
Node ID: 521b8bf4-0dd0-47e4-a930-07b345b70a8d
Similarity: 0.8476141455153147
Text: banking on  people             Dear Valued Customer,     Please
read the instructions below before completing the KYC, FATCA & CRS
forms in respective to  your account type.     Personal/ Individual
Account:      The account holder (s) is / are required to complete
and sign the Individual KYC  form in line  with his/her information.
In the cas...

In [38]:
# getting multiple retrivers

from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.postprocessor import SimilarityPostprocessor


retriever = VectorIndexRetriever(index=index,similarity_top_k=4)
postprocessor=SimilarityPostprocessor(similarity_cutoff=0.40)

query_engine = RetrieverQueryEngine(retriever=retriever, node_postprocessors=[postprocessor])

In [39]:
response = query_engine.query('what is KYC?')

In [40]:
# display response much better way

from llama_index.response.pprint_utils import pprint_response
pprint_response(response,show_source=True)
print(response)

Final Response: KYC stands for Know Your Customer. It is a process
that financial institutions and other businesses use to verify the
identity of their customers. The purpose of KYC is to prevent identity
theft, fraud, money laundering, and other illegal activities. It
involves collecting and verifying personal information and
documentation from customers, such as identification cards, passports,
proof of address, and other relevant documents.
______________________________________________________________________
Source Node 1/2
Node ID: 521b8bf4-0dd0-47e4-a930-07b345b70a8d
Similarity: 0.7975573151988193
Text: banking on  people             Dear Valued Customer,     Please
read the instructions below before completing the KYC, FATCA & CRS
forms in respective to  your account type.     Personal/ Individual
Account:      The account holder (s) is / are required to complete
and sign the Individual KYC  form in line  with his/her information.
In the cas...
________________________________

In [41]:
import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is KYC?")
print(response)

KYC stands for Know Your Customer.
