### Install Packages

In [2]:
%pip install llama-index==0.10.18 llama-index-llms-groq==0.1.3 groq==0.4.2 llama-index-embeddings-huggingface==0.2.0

Looking in indexes: https://e1079458:****@artifactory.fis.dev/artifactory/api/pypi/apexsecfin-pypi-dev/simple, http://pypi.org/simple
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Import Libraries

In [3]:
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    load_index_from_storage
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq
import os
from dotenv import load_dotenv
load_dotenv()
import warnings
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# from google.colab import userdata
# GROQ_API_KEY = userdata.get('groq')

GROQ_API_KEY = os.getenv("GROQ_API_KEY")

### Data Ingestion

In [5]:
# data ingestion
reader = SimpleDirectoryReader(input_files=["./data/Basics_of_finance.pdf"])
documents = reader.load_data()

https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/

In [6]:
# The pdf doc has 86 pages
len(documents)
# print(documents)

86

In [7]:
# The 11 page of the doc
documents[10].metadata

{'page_label': '11',
 'file_name': 'Basics_of_finance.pdf',
 'file_path': 'data\\Basics_of_finance.pdf',
 'file_type': 'application/pdf',
 'file_size': 1879774,
 'creation_date': '2025-03-10',
 'last_modified_date': '2025-03-10'}

### Chunking

In [8]:
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
nodes = text_splitter.get_nodes_from_documents(documents, show_progress=True)

Parsing nodes: 100%|██████████| 86/86 [00:00<00:00, 1792.30it/s]


https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [9]:
len(nodes)

86

In [10]:
nodes[0].metadata

{'page_label': '1',
 'file_name': 'Basics_of_finance.pdf',
 'file_path': 'data\\Basics_of_finance.pdf',
 'file_type': 'application/pdf',
 'file_size': 1879774,
 'creation_date': '2025-03-10',
 'last_modified_date': '2025-03-10'}

https://chunkviz.up.railway.app/

### Embedding Model

In [11]:
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

https://huggingface.co/spaces/mteb/leaderboard

### Define LLM Model

In [12]:
llm = Groq(model="llama3-70b-8192", api_key=GROQ_API_KEY)

https://console.groq.com/docs/models

https://console.groq.com/keys

### Configure Service Context

In [13]:
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)

### Create Vector Store Index

In [14]:
vector_index = VectorStoreIndex.from_documents(documents, show_progress=True, service_context=service_context, node_parser=nodes)

Parsing nodes: 100%|██████████| 86/86 [00:00<00:00, 1881.55it/s]
Generating embeddings: 100%|██████████| 86/86 [00:02<00:00, 39.63it/s]


https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/

#### Persist/Save Index

In [15]:
vector_index.storage_context.persist(persist_dir="./storage_mini")

#### Define Storage Context

In [16]:
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

https://docs.llamaindex.ai/en/stable/api_reference/storage/storage_context/

#### Load Index

In [17]:
index = load_index_from_storage(storage_context, service_context=service_context)

### Define Query Engine

In [18]:
query_engine = index.as_query_engine(service_context=service_context)

https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/

#### Feed in user query

https://docs.llamaindex.ai/en/stable/examples/prompts/prompts_rag/#viewingcustomizing-prompts

## Example-1: Query returning result from the queried document

In [26]:
query = "Explain bonds, debts, and healthy lifestyle (General knowledge) and provide the page reference(s) in a dictionary format inside a list. provide also a description field as a concise summary of about 100 words for each"
resp = query_engine.query(query)

In [27]:
print(resp.response)

Here is the answer:

[
    {
        "term": "Bonds",
        "description": "Bonds are long-term liabilities that a company or institution issues to raise capital. They are essentially debt securities that represent a loan made by an investor to the borrower. In the context of a balance sheet, bonds are listed as long-term liabilities, meaning they are due over 12 months.",
        "page_reference": [4]
    },
    {
        "term": "Debts",
        "description": "Debts refer to the amount of money owed by an individual or organization to another party. In the context of a balance sheet, debts are listed as liabilities, which can be either short-term (due within 12 months) or long-term (due over 12 months).",
        "page_reference": [4]
    },
    {
        "term": "Healthy Lifestyle",
        "description": "Not applicable in this context. The provided context information only discusses finance and accounting concepts, and does not mention healthy lifestyle.",
        "page_referen

## Example-2: Query returning result from the LLM general knowledge as it does not exist in the doc

In [28]:
query = "Explain buyside and sellside and provide the page reference in a dictionary format. provide also a description field as a concise summary of about 100 words"
resp = query_engine.query(query)

In [22]:
print(resp.response)

Since the provided context does not mention "buyside" and "sellside", it is not possible to explain these terms or provide a page reference based on the given context.

Here is an empty dictionary as the answer:

{
"description": "No information available in the provided context.",
"page_reference": None
}

Please note that the context only discusses basic finance concepts, such as the balance sheet, assets, liabilities, and equity, but does not mention "buyside" and "sellside".


https://itsjb13.medium.com/building-a-rag-chatbot-using-llamaindex-groq-with-llama3-chainlit-b1709f770f55

https://docs.llamaindex.ai/en/stable/optimizing/production_rag/

In [29]:
from langchain_groq import ChatGroq

In [30]:
load_dotenv()

True

In [None]:
# Initialize the Groq LLM outside the RAG knowledge context
llm = ChatGroq(model="llama3-70b-8192")

TypeError: __init__() got an unexpected keyword argument 'proxies'