### This notebook explores if you really need a custom Agent tool for doing calculations or the LLM can handle the calculations directly.

As a part of exercise, I crawled the page as document which has fees information: https://www.canada.ca/en/immigration-refugees-citizenship/services/immigrate-canada/express-entry.html which contains information about application fees. Chunked the document into paragraphs, and asked questions like this:

User Query: I have 3 dependent children and a spouse. How much will be my total applications fees?

This exercise shows how the LLM can do basic maths operations and we may not need a custom tool for the same.


In [None]:
from llama_index.core import SimpleDirectoryReader

# Load your documents
documents = SimpleDirectoryReader(input_files=["../data/docs/immigration-refugees-citizenship_services_immigrate-canada_express-entry.txt"]).load_data()

### Chunk documents on custom splitter.

In [None]:
from llama_index.core.text_splitter import TokenTextSplitter

class ParagraphSplitter(TokenTextSplitter):
    def split_text(self, text):
        """
        Splits text on 3 new line (Text separated by 2 empty lines)

        """
        paragraphs = [p.strip() for p in text.split('\n\n\n') if p.strip()]
        return paragraphs

splitter = ParagraphSplitter()


### Example to check how the custom split works on raw text

In [16]:

text = """This is para1.


This is para2.

This is parA3"""

nodes = splitter.split_text(text)

for p in nodes:
    print(p)
    print ("-------------------")

This is para1.
-------------------
This is para2.

This is parA3
-------------------


### Split on documents

In [None]:

nodes = splitter.get_nodes_from_documents(documents)


# Apply splitter to each document
all_paragraphs = []
for doc in documents:
    nodes = splitter.split_text(doc.text)
    all_paragraphs.extend(nodes)

### Storing embeddings on Faiss vector store on paragraph split chunks


In [None]:
# Initialize embeddings, used for encoding documents into embedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

  from .autonotebook import tqdm as notebook_tqdm
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [None]:
from llama_index.vector_stores.faiss import FaissVectorStore
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
import faiss


In [None]:
# 1. Load documents
#documents = SimpleDirectoryReader('path/to/docs').load_data()
from llama_index.core.text_splitter import SentenceSplitter
from llama_index.core.schema import TextNode

documents = SimpleDirectoryReader(input_files=["../data/docs/immigration-refugees-citizenship_services_immigrate-canada_express-entry.txt"]).load_data()

# 2. Split into paragraphs strings
splitter = ParagraphSplitter()
all_nodes = []

for doc in documents:
    paragraphs = splitter.split_text(doc.text)
    # Convert to Node objects
    nodes = [TextNode(text=p) for p in paragraphs]
    all_nodes.extend(nodes)


## 3. Set up embedding model
# embed_model = OpenAIEmbedding()  # or your own embedding model
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

# # 4. Create FAISS vector store and embed
# Create FAISS index
dimension = 384  # Dimension for MiniLM embeddings
faiss_index = faiss.IndexFlatL2(dimension)

# Construct the FaissVectorStore
vector_store = FaissVectorStore(faiss_index=faiss_index)


# Create a StorageContext to use with LlamaIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)


index = VectorStoreIndex(
        nodes, storage_context=storage_context, 
        embed_model=embed_model)

# 5. Persist FAISS index (optional)
# faiss.write_index(index.vector_store.faiss_index, "faiss_index_chunks.idx")

In [36]:
nodes

['This is content for https://www.canada.ca/en/immigration-refugees-citizenship/services/immigrate-canada/express-entry',
 'You are here:\n - Canada.ca\n - Immigration and citizenship\n - Immigrate to Canada',
 "Express Entry\n \nExpress Entry is an online system that we use to manage immigration applications from skilled workers.\n \nThere are 3 immigration programs managed through Express Entry:\n - Canadian Experience Class\n - Federal Skilled Worker Program\n - Federal Skilled Trades Program\n \nHow the Express Entry process works:\n - Create a profile and enter the pool.\n - We'll invite the candidates with the most points in rounds.\n - If you're invited to apply, fill out the application.\n - We'll review your application and make a decision.",
 'Processing times\n \nVaries by program (Refer page: /en/immigration-refugees-citizenship/services/application/check-processing-times.html)',
 'Fees\n \nYour application: $CAN\xa01,525\n \nFees for your family members:\n - Spouse: $CAN\x

retriever = index.as_retriever()

In [41]:
retriever = index.as_retriever()

In [42]:
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama2", request_timeout=100.0)

# Query function
def query_rag_system(question):
    retrieved_docs = retriever.retrieve(question)
    print ("retrived nodes: ")
    context = "\n".join([doc.text for doc in retrieved_docs]) 
    for doc in retrieved_docs:
        print (doc)
        print ("----------------")
       
    prompt = f"Context:\n{context}\n\n Question: {question}\n\n\n\nAnswer:"
    response = llm.complete(prompt)
    return response

In [43]:
response = query_rag_system("application fees")

retrived nodes: 
Node ID: 68b89711-b98e-410d-90cf-baa63b2be82d
Text: Fees   Your application: $CAN 1,525   Fees for your family
members:  - Spouse: $CAN 1,525  - Dependent child: $CAN 260
Score:  0.765

----------------
Node ID: a70a0fed-c6c3-4399-94c9-21148ab34c34
Text: Sections   Who can apply:    Check your score:    Get your
documents ready:    Create your profile:    Rounds of invitations:
Apply for permanent residence:    After you apply:    If we approve
your application:
Score:  1.473

----------------


In [12]:
response.text

"The fee for applying through the Express Entry system is $CAN\xa01,525. This fee applies to each applicant, including the primary applicant and any accompanying family members.\n\nThere are additional fees for family members, including:\n\n* Spouse: $CAN\xa01,525\n* Dependent child: $CAN\xa0260\n\nIt's important to note that these fees are subject to change, so it's best to check the official government website for the most up-to-date information."

In [46]:
response = query_rag_system("I have 3 dependent children and a spouse. How much will be my total applications fees?")

retrived nodes: 
Node ID: 68b89711-b98e-410d-90cf-baa63b2be82d
Text: Fees   Your application: $CAN 1,525   Fees for your family
members:  - Spouse: $CAN 1,525  - Dependent child: $CAN 260
Score:  0.267

----------------
Node ID: a70a0fed-c6c3-4399-94c9-21148ab34c34
Text: Sections   Who can apply:    Check your score:    Get your
documents ready:    Create your profile:    Rounds of invitations:
Apply for permanent residence:    After you apply:    If we approve
your application:
Score:  1.439

----------------


In [47]:
response.text

'The total application fees for your family of five (3 dependent children and a spouse) would be $CAN\xa02,485. This is calculated as follows:\n\n* Your application fee: $CAN\xa01,525\n* Fees for your spouse: $CAN\xa01,525\n* Fees for each dependent child: $CAN\xa0260 x 3 = $CAN\xa0780\n\nTotal fees: $CAN\xa02,485.'

In [48]:
response = query_rag_system("I am applying for myself and my spouse. How much will be my total applications fees?")

retrived nodes: 
Node ID: 68b89711-b98e-410d-90cf-baa63b2be82d
Text: Fees   Your application: $CAN 1,525   Fees for your family
members:  - Spouse: $CAN 1,525  - Dependent child: $CAN 260
Score:  0.509

----------------
Node ID: a70a0fed-c6c3-4399-94c9-21148ab34c34
Text: Sections   Who can apply:    Check your score:    Get your
documents ready:    Create your profile:    Rounds of invitations:
Apply for permanent residence:    After you apply:    If we approve
your application:
Score:  1.254

----------------


In [49]:
response.text

'Based on the information provided in the context, if you are applying for yourself and your spouse, the total application fees would be $CAN 1,525 + $CAN 1,525 = $CAN 3,050.'

In [50]:
response = query_rag_system("I am a single mother with 1 child. How much is my application fees.")

retrived nodes: 
Node ID: 68b89711-b98e-410d-90cf-baa63b2be82d
Text: Fees   Your application: $CAN 1,525   Fees for your family
members:  - Spouse: $CAN 1,525  - Dependent child: $CAN 260
Score:  0.456

----------------
Node ID: a70a0fed-c6c3-4399-94c9-21148ab34c34
Text: Sections   Who can apply:    Check your score:    Get your
documents ready:    Create your profile:    Rounds of invitations:
Apply for permanent residence:    After you apply:    If we approve
your application:
Score:  1.500

----------------


In [51]:
response.text

'Based on the information provided in the context, the application fee for a single mother with one child is $CAN 1,525. This is the same as the application fee for the main applicant. The fees for the family members are as follows:\n\n* Spouse: $CAN 1,525\n* Dependent child: $CAN 260\n\nSo, the total application fee for a single mother with one child would be $CAN 1,525 + $CAN 260 = $CAN 1,785.'