### Creating a Rag system using llama index to query different pdf files
- The data is stored in the Data directory
- Inside the Data directory there exists 2 papers about few shot learning
- There is a .env file in my local directory that holds the OpenAI_API_KEY




In [33]:
import os
import os.path
from dotenv import load_dotenv
from llama_index.core import(   
VectorStoreIndex, 
SimpleDirectoryReader,
StorageContext,
load_index_from_storage,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.postprocessor import SimilarityPostprocessor
from llama_index.core.response.pprint_utils import pprint_response

In [34]:
# loading the env file and variables 
def reading_indexing_files(data_dir,API_KEY_NAME):
    #loading the env variables from the .env file
    load_dotenv()
    #setting up the OPENAI_API_KEY
    os.environ["OPENAI_API_KEY"]= os.getenv(API_KEY_NAME)
    #read the different files in the Data directory and creating the meta data
    files = SimpleDirectoryReader(data_dir).load_data()
    #creating the indicies for those files
    index = VectorStoreIndex.from_documents(files,show_progress= True)
    #returning the different files and the corresponding indices for further use
    return files, index
files,index = reading_indexing_files(data_dir ="Data",API_KEY_NAME = "API_KEY" )


Parsing nodes: 100%|██████████| 43/43 [00:00<00:00, 200.19it/s]
Generating embeddings: 100%|██████████| 86/86 [00:02<00:00, 38.53it/s]


### Setting up the query engine and its paramters:
- The retriever paramters takes as paramters the index of the files as well as the number of top answers we want the retriever to get back
- The postprocessor handles till what percentage of similarity does we want the retriever to bring back
- The query engine uses the above 


In [35]:
def setting_query_engine(index,similarity_top_k,similarity_cutoff):
    retriever = VectorIndexRetriever(index = index, similarity_top_k = similarity_top_k )
    postprocessor = SimilarityPostprocessor(similarity_cutoff = similarity_cutoff )
    query_engine = RetrieverQueryEngine(retriever = retriever, node_postprocessors= [postprocessor])
    return query_engine
query_engine = setting_query_engine(index,4,0.80)

### Handling a query through the quesry engine
#### Method querying_read_from_storage
- Takes as a parmater the storage directory where the indices of the files are gonna be stored on the hard disk rather than the memory so the memory don't get exhuased if we have a lot of files
- If the storage dir is found then we load from it else we create it and save the indices in it
- Then use the query engine we created, with the wanted paramters and running the query

In [36]:
# check if storage already exists
def querying_read_from_storage(storage_dir,index,query_engine,query):
    storage_dir = storage_dir
    if not os.path.exists(storage_dir):
        index.storage_context.persist(persist_dir=storage_dir)
    else:
        # load the existing index
        storage_context = StorageContext.from_defaults(persist_dir=storage_dir)
        index = load_index_from_storage(storage_context)

    # either way we can now query the index
    query_engine = query_engine
    response = query_engine.query(query)
    return response
response = querying_read_from_storage(storage_dir="./storage",index= index,query_engine=query_engine,query = "Explain Few SHot Learning?" )
pprint_response(response,show_source= True)

Final Response: Few-shot learning (FSL) is a learning method that
involves rapidly acquiring valid information from just a few or even
zero samples. It is inspired by human reasoning capabilities and is
commonly applied in edge computing scenarios. FSL aims to address the
challenge of effectively learning from small sample datasets or across
different domains. This learning approach has shown great potential in
various fields, especially in scenarios where traditional data-driven
algorithms struggle due to limited data availability or domain
variations.
______________________________________________________________________
Source Node 1/4
Node ID: 4ad0c169-faea-4e1a-8341-a5569f6e54c8
Similarity: 0.8668009876908989
Text: 3 TABLE 2 A List Of Key Acronyms NOMENCLATURE Full Form
Abbreviation Full Form Abbreviation Artiﬁcial Intelligence AI Few-Shot
Learning FSL Deep Learning DL Machine Learning ML Zero-Shot Learning
ZSL One-Shot Learning OSL Neural Architecture Search NAS Conventional
Neur