# API KEYS

In [None]:
from google.colab import userdata

LANGCHAIN_API_KEY = userdata.get('LANGCHAIN_API_KEY')
PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
OPENAI_API_KEY = userdata.get('openai_key')

# Load Documents

# Tracing with Langsmith

In [None]:
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "langchain-subrata"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY

In [None]:
!pip install pypdf -q

In [None]:
!pip install langchain_community -q

In [None]:
!pip install docx2txt -q

In [None]:
!pip install wikipedia -q

## Wikipedia Functions

In [None]:
def load_from_wikipedia(query, lang="en", load_max_docs=2):
  """
  docs[0].metadata  # meta-information of the Document
  docs[0].page_content[:400]  # a content of the Document
  """
  from langchain_community.document_loaders import WikipediaLoader

  print("Loading from Wikipedia ...")
  docs = WikipediaLoader(query="Accounting", load_max_docs=2).load()
  print("Done!")
  return docs

In [None]:
docs = load_from_wikipedia("Accounting")

Loading from Wikipedia ...
Done!




  lis = BeautifulSoup(html).find_all('li')


In [None]:
docs[0].metadata  # meta-information of the Document

{'title': 'Accounting',
 'summary': 'Accounting, also known as accountancy, is the process of recording and processing information about economic entities, such as businesses and corporations. Accounting measures the results of an organization\'s economic activities and conveys this information to a variety of stakeholders, including investors, creditors, management, and regulators. Practitioners of accounting are known as accountants. The terms "accounting" and "financial reporting" are often used interchangeably.\nAccounting can be divided into several fields including financial accounting, management accounting, tax accounting and cost accounting. Financial accounting focuses on the reporting of an organization\'s financial information, including the preparation of financial statements, to the external users of the information, such as investors, regulators and suppliers. Management accounting focuses on the measurement, analysis and reporting of information for internal use by mana

In [None]:
docs[0].page_content[:400]  # a content of the Document

"Accounting, also known as accountancy, is the process of recording and processing information about economic entities, such as businesses and corporations. Accounting measures the results of an organization's economic activities and conveys this information to a variety of stakeholders, including investors, creditors, management, and regulators. Practitioners of accounting are known as accountants"

## Pdf & Docx Functions

In [None]:
def load_document(file):
  import os

  print("Detecting file type ...")
  name, extension = os.path.splitext(file)
  print(f"{extension[1:]} file type detected!")

  print(f"Loading {file} ...")

  if extension == ".pdf":
    from langchain_community.document_loaders import PyPDFLoader

    loader = PyPDFLoader(file)
    pages = loader.load_and_split()

    print(f"{len(pages)} pages parsed successfully!")
    print("Done!")
    return pages

  elif extension == ".docx":
    from langchain_community.document_loaders import Docx2txtLoader

    loader = Docx2txtLoader(file)
    pages = loader.load_and_split()

    print(f"{len(pages)} pages parsed successfully!")
    print("Done!")
    return pages

  else:
    print(f"Unfortunately! {extension} is not supported")

## Loading Microsoft Document

In [None]:
# loading the document
pages = load_document("/content/accounting.docx")

Detecting file type ...
docx file type detected!
Loading /content/accounting.docx ...
203 pages parsed successfully!
Done!


In [None]:
pages[10]

Document(metadata={'source': '/content/accounting.docx'}, page_content="Accounting and Statistics: The use of statistics in accounting can be appreciated better in the context of the nature of accounting records. Accounting information is very precise; it is exact to the last paisa. But, for decision-making purposes such precision is not necessary and hence, the statistical approximations are sought.\n\nIn accounts, all values are important individually because they relate to business transactions. As against this, statistics is concerned with the typical value, behaviour or trend over a period of time or the degree of variation over a series of observations. Therefore, wherever a need arises for only broad generalisations or the average of relationships, statistical methods have to be applied in accounting data.\n\nFurther, in accountancy, the classification of assets and liabilities as well as the heads of income and expenditure has been done as per the needs of financial recording t

In [None]:
pages[10].page_content

"Accounting and Statistics: The use of statistics in accounting can be appreciated better in the context of the nature of accounting records. Accounting information is very precise; it is exact to the last paisa. But, for decision-making purposes such precision is not necessary and hence, the statistical approximations are sought.\n\nIn accounts, all values are important individually because they relate to business transactions. As against this, statistics is concerned with the typical value, behaviour or trend over a period of time or the degree of variation over a series of observations. Therefore, wherever a need arises for only broad generalisations or the average of relationships, statistical methods have to be applied in accounting data.\n\nFurther, in accountancy, the classification of assets and liabilities as well as the heads of income and expenditure has been done as per the needs of financial recording to ascertain financial results of various operations. Other types of cla

In [None]:
from textwrap import fill
from IPython.display import display, HTML

def print_wrapped(text, width=80):
    wrapped_text = fill(text, width=width)
    display(HTML(f"<pre style='white-space:pre-wrap;'>{wrapped_text}</pre>"))

for page in pages[0:5]:
    print_wrapped(str(page))
    print("="*100)











## Loading pdf

In [None]:
# loading the document
pages = load_document("/content/accounting.pdf")



Detecting file type ...
pdf file type detected!
Loading /content/accounting.pdf ...
387 pages parsed successfully!
Done!


In [None]:
# checking the type and the results
print(type(pages))
print(pages)

<class 'list'>
[Document(metadata={'source': '/content/accounting.pdf', 'page': 0}, page_content='CHAPTER 1 UNIT - 1  1.MEANING AND SCOPE OF ACCOUNTING 1.INTRODUCTION Every individual performs some kind of economic activity. A salaried person gets salary and spends to buy provisions and clothing, for children\'s education, construction of house, etc. A sports club formed by a group of individuals, a business run by an individual or a group of individuals, a company running a business in telecom sector, a local authority like Calcutta Municipal Corporation, Delhi Development Authority, Governments, either Central or State, all are carrying some kind of economic activities. Not necessarily all the economic activities are run for any individual benefit; such economic activities may create social benefit i.e. benefit for the public, at large. Anyway, such economic activities are performed through \'transactions and events\'. Transaction is used to mean \'a business, performance of an act, 

In [None]:
# checking info
print(pages[0].page_content)
print(pages[0].metadata)
print(f"Total number of characters in the first page: {len(pages[0].page_content)}")

CHAPTER 1 UNIT - 1  1.MEANING AND SCOPE OF ACCOUNTING 1.INTRODUCTION Every individual performs some kind of economic activity. A salaried person gets salary and spends to buy provisions and clothing, for children's education, construction of house, etc. A sports club formed by a group of individuals, a business run by an individual or a group of individuals, a company running a business in telecom sector, a local authority like Calcutta Municipal Corporation, Delhi Development Authority, Governments, either Central or State, all are carrying some kind of economic activities. Not necessarily all the economic activities are run for any individual benefit; such economic activities may create social benefit i.e. benefit for the public, at large. Anyway, such economic activities are performed through 'transactions and events'. Transaction is used to mean 'a business, performance of an act, an agreement' while event is used to mean 'a happening, as a consequence of transaction(s), a result.'

In [None]:
# checking info
print(pages[1].page_content)
print(pages[1].metadata)
print(f"Total number of characters in the second page: {len(pages[1].page_content)}")

transactions (transactions made by the user, raising invoice to the customer, receipt of money, payment towards salaries, marketing etc.). Likewise, the individual running the stationery business, would need to record all business transactions. This recording is done in Journal or subsidiary books, also known as primary books. Every good record keeping system includes suitable classification of transactions and events as well as their summarisation for ready reference. For example, the telecom company performing thousands of transactions on a daily basis, is not expected to publish all those transactions for the users to be able to make a decision. Surely, those transactions need to be summarized appropriately. After the transactions and events are recorded, they are transferred to secondary books, i.e., Ledger. In ledger, transactions and events are classified in terms of income, expense, assets and liabilities according to their characteristics and summarised in profit and loss accou

In [None]:
# checking info
print(pages[2].page_content)
print(pages[2].metadata)
print(f"Total number of characters in the second page: {len(pages[2].page_content)}")

"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions." Thus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results thereof to the persons interested in such information. The above definition requires accountants to assume a bigger responsibility than to merely do book-keeping. Accountants need to be ready to provide the information ready for the intended users to be able to make economic decisions. 2.1.Procedural aspects of Accounting On the basis of the above definitions, procedure of accounting can be basically divided into two parts: i.Generating financial information and ii.Using the financial information. Generating Financial Information 1.Recording - This is the basic function of accounting. All business transactions of a financial character, as

# Chunk Data Function

In [None]:
!pip install langchain-text-splitters -q

In [None]:
def chunk_data(data, chunk_size=256):
  from langchain_text_splitters import RecursiveCharacterTextSplitter

  print(f"Chunking ...")

  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=chunk_size,
      chunk_overlap=0
  )
  chunks = text_splitter.split_documents(data)

  print(f"Chunk size {len(chunks)}.")
  print("Done!")

  return chunks

In [None]:
chunks = chunk_data(data=pages, chunk_size=1000)

Chunking ...
Chunk size 1026.
Done!


In [None]:
chunks[0]

Document(metadata={'source': '/content/accounting.docx'}, page_content='CHAPTER 1\n\nUNIT - 1 \n\nMEANING AND SCOPE OF ACCOUNTING\n\nINTRODUCTION\n\n\n\nEvery individual performs some kind of economic activity. A salaried person gets salary and spends to buy provisions and clothing, for children\'s education, construction of house, etc. A sports club formed by a group of individuals, a business run by an individual or a group of individuals, a company running a business in telecom sector, a local authority like Calcutta Municipal Corporation, Delhi Development Authority, Governments, either Central or State, all are carrying some kind of economic activities. Not necessarily all the economic activities are run for any individual benefit; such economic activities may create social benefit i.e. benefit for the public, at large. Anyway, such economic activities are performed through \'transactions and events\'. Transaction is used to mean \'a business, performance of an act, an agreement\'

In [None]:
chunks[0].page_content

'CHAPTER 1\n\nUNIT - 1 \n\nMEANING AND SCOPE OF ACCOUNTING\n\nINTRODUCTION\n\n\n\nEvery individual performs some kind of economic activity. A salaried person gets salary and spends to buy provisions and clothing, for children\'s education, construction of house, etc. A sports club formed by a group of individuals, a business run by an individual or a group of individuals, a company running a business in telecom sector, a local authority like Calcutta Municipal Corporation, Delhi Development Authority, Governments, either Central or State, all are carrying some kind of economic activities. Not necessarily all the economic activities are run for any individual benefit; such economic activities may create social benefit i.e. benefit for the public, at large. Anyway, such economic activities are performed through \'transactions and events\'. Transaction is used to mean \'a business, performance of an act, an agreement\' while event is used to mean \'a happening, as a consequence of transac

## Embedding Cost

In [None]:
def print_embedding_cost(texts:str                    ):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0003:.6f}')
    return total_tokens

print_embedding_cost(texts=chunks)

Total Tokens: 154933
Embedding Cost in USD: 0.046480


154933

## OpenAI Embedding


In [None]:
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=OPENAI_API_KEY
  )

# Embed a single query
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(len(query_result),query_result)

3072 [-0.014380057342350483, -0.027191713452339172, -0.020042717456817627, 0.05730138346552849, -0.02226766012609005, 0.0215016957372427, -0.023234233260154724, 0.06408563256263733, -0.01676001586019993, 0.01894848421216011, 0.0184651967138052, 0.024693211540579796, -0.01564754545688629, -0.04847456142306328, -0.007076045963913202, 0.03869940713047981, -0.023325419053435326, -0.001235572504810989, -0.012994027696549892, -0.02354426681995392, 0.01656852476298809, 0.004500037059187889, -0.04136204347014427, 0.045520130544900894, 0.01589374803006649, 0.016860321164131165, -0.00019719006377272308, 0.008038059808313847, 0.018246350809931755, 0.004091979004442692, 0.016085239127278328, 0.045629553496837616, -0.023507792502641678, -0.027975913137197495, 0.05328918993473053, -0.004226478282362223, 0.044827114790678024, 0.059708695858716965, 0.015538121573626995, -0.01699710078537464, 0.030711498111486435, -0.012082166038453579, -0.022431794553995132, 0.02702757716178894, 0.026079241186380386, 

## Delete all Pinecone Indexes

In [None]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=PINECONE_API_KEY)

indexes = pc.list_indexes()
print(indexes)

{'indexes': [{'deletion_protection': 'disabled',
              'dimension': 3072,
              'host': 'subrata-rag-qa-0390e5b.svc.aped-4627-b74a.pinecone.io',
              'metric': 'cosine',
              'name': 'subrata-rag-qa',
              'spec': {'serverless': {'cloud': 'aws', 'region': 'us-east-1'}},
              'status': {'ready': True, 'state': 'Ready'}}]}


In [None]:
len(indexes)

1

In [None]:
for index in indexes:
      print(index["name"])

subrata-rag-qa


In [None]:
def delete_pinecone_indexes(index_name="all", pinecone_api_key=PINECONE_API_KEY):
  from pinecone import Pinecone, ServerlessSpec

  pc = Pinecone(api_key=pinecone_api_key)

  if index_name=="all":
    indexes = pc.list_indexes()
    print("Deleting all indexes ...")
    for index in indexes:
      pc.delete_index(index["name"])
      print(f"Deleting index: {index['name']} ...")
  else:
    print(f"Deleting Index: {index_name}")
    pc.delete_index(index_name)
    print("Done!")

## Create Pinecone Index Function

In [None]:
!pip install pinecone -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m235.5/244.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/117.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.6/117.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install langchain_pinecone -q

In [None]:
def create_index(index_name, pinecone_api_key=PINECONE_API_KEY):
  import time
  from pinecone import Pinecone, ServerlessSpec

  pc = Pinecone(api_key=pinecone_api_key)
  existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

  if index_name not in existing_indexes:
      print(f"Creating index {index_name} ...")
      pc.create_index(
          name=index_name,
          dimension=3072,
          metric="cosine",
          spec=ServerlessSpec(cloud="aws", region="us-east-1"),
      )
      while not pc.describe_index(index_name).status["ready"]:
          time.sleep(1)

  index = pc.Index(index_name)
  return index

## Upstream Embeddings Function

In [None]:
def upstream_embeddings(chunks, index_name, openai_api_key=OPENAI_API_KEY, pinecone_api_key=PINECONE_API_KEY):
  from langchain_pinecone import PineconeVectorStore
  from langchain_openai import OpenAIEmbeddings
  from pinecone import Pinecone, ServerlessSpec

  # Set the Pinecone API key as an environment variable
  os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
  pc = Pinecone(api_key=pinecone_api_key)
  embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=openai_api_key
  )
  vector_store=PineconeVectorStore.from_documents(documents=chunks, embedding=embeddings, index_name=index_name)
  print("Done filling up the Pinecone Vector store with the Embeddings (Knowledge)")
  return vector_store

## Delete existing Pinecone Indexes

In [None]:
%%time
# first deleting the pinecone indexes as we're using the free tier
delete_pinecone_indexes(index_name="all", pinecone_api_key=PINECONE_API_KEY)

Deleting all indexes ...
Deleting index: subrata-rag-qa ...
CPU times: user 135 ms, sys: 7.14 ms, total: 142 ms
Wall time: 5.58 s


## Create Index

In [None]:
%%time
# create index
index=create_index(index_name="subrata-rag-qa", pinecone_api_key=PINECONE_API_KEY)

Creating index subrata-rag-qa ...
CPU times: user 157 ms, sys: 9.98 ms, total: 167 ms
Wall time: 5.65 s


## Upstream Embedded data to Pinecone

In [None]:
%%time
vector_store=upstream_embeddings(chunks=chunks, index_name="subrata-rag-qa", openai_api_key=OPENAI_API_KEY)

Done filling up the Pinecone Vector store with the Embeddings (Knowledge)


In [None]:
type(vector_store)

In [None]:
vector_store

<langchain_pinecone.vectorstores.PineconeVectorStore at 0x7937a6c1bb80>

In [None]:
vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

# Asking & Getting Answers

In [None]:
!pip install langchain-openai -qU

In [None]:
def ask_and_get_answer(vector_store, query, openai_api_key=OPENAI_API_KEY):
  from langchain_openai import ChatOpenAI
  from langchain.chains import RetrievalQA

  llm = ChatOpenAI(
      model="gpt-4o",
      temperature=0,
      max_tokens=None,
      timeout=None,
      max_retries=2,
      api_key=openai_api_key,
  )
  retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

  qa_chain = RetrievalQA.from_chain_type(
      llm=llm,
      chain_type="stuff",
      retriever=retriever
  )
  result = qa_chain.run(query)
  return result

In [None]:
# testing 1
query="What is the whole document about?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

str

In [None]:
answer

"The document appears to be an excerpt from a textbook on accounting. It covers several fundamental aspects of accounting, including:\n\n1. **Meaning and Scope of Accounting**: This section introduces the concept of accounting, defining it as the art of recording, classifying, and summarizing financial transactions and events. It emphasizes the importance of full, fair, and adequate disclosure in financial statements to ensure that they provide a true and fair view of the business's financial position.\n\n2. **Principles of Full Disclosure and Completeness**: The document discusses the principles of full, fair, and adequate disclosure, which require that financial statements disclose all relevant and reliable information. It also touches on the principle of completeness, which ensures that financial information is not misleading or deficient.\n\n3. **Capital and Revenue Expenditures and Receipts**: This section introduces the concepts of capital and revenue expenditures and receipts, h

In [None]:
# testing 2
query="Give a brief summary about Accounting."
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

Accounting is a comprehensive process that involves identifying, measuring, recording, classifying, summarizing, analyzing, and interpreting financial transactions and events. Its primary purpose is to provide quantitative financial information about economic entities, which is essential for making informed economic decisions. The process begins with book-keeping, which systematically records financial transactions, and extends to accounting, which summarizes and interprets these records to generate financial statements and reports. These outputs are crucial for management and other stakeholders to assess the financial health and performance of a business. Accounting also plays a vital role in communicating economic information to users, enabling them to make informed judgments and decisions.


In [None]:
# testing 3
query="What are the procedural aspects of accounting?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

The procedural aspects of accounting can be divided into two main parts:

1. **Generating Financial Information:**
   - **Recording:** This is the basic function of accounting. All business transactions of a financial character, as evidenced by documents such as sales bills, passbooks, salary slips, etc., are recorded in the books of account. Recording is done in a book called the "Journal," which may be further divided into several subsidiary books according to the nature and size of the business. After recording, transactions are transferred to secondary books, i.e., the Ledger, where they are classified in terms of income, expense, assets, and liabilities and summarized in the profit and loss account and balance sheet.
   - **Measurement:** Transactions and events are measured in terms of money, using the ruling currency of the country (e.g., rupee in India, dollar in the U.S.A.). The transactions must have financial characteristics.
   - **Classification and Summarization:** Transa

In [None]:
# testing 4
query="Explain the procedural aspects of accounting?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

The procedural aspects of accounting can be divided into two main parts: generating financial information and using the financial information.

### Generating Financial Information

1. **Recording**:
   - This is the basic function of accounting. It involves documenting all business transactions of a financial nature, which are evidenced by documents such as sales bills, passbooks, salary slips, etc.
   - Recording is done in a book called the "Journal," which may be divided into several subsidiary books based on the nature and size of the business.
   - After recording, transactions are transferred to secondary books, i.e., the Ledger, where they are classified in terms of income, expense, assets, and liabilities.
   - Transactions and events are measured in terms of money, using the ruling currency of the country (e.g., rupee in India, dollar in the U.S.A.).

2. **Classifying**:
   - Transactions recorded in the Journal are classified in the Ledger. This involves grouping transaction

In [None]:
# testing 5
query="Explain the evolution of accounting as a social science?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

The evolution of accounting as a social science can be understood through its historical development and its increasing role in serving societal needs. Here is a detailed explanation:

### Historical Development

1. **Ancient Civilizations**:
   - **Egyptians (around 4000 BC)**: Used accounting for their treasuries, with day-wise reports sent to superiors and monthly reports to kings.
   - **Babylonia**: Employed accounting to identify losses due to fraud and inefficiency in commerce.
   - **Greece**: Utilized accounting to manage government financial transactions, including receipts, payments, and balances.
   - **Romans (700 B.C to 400 A.D)**: Maintained records of receipts and payments in memoranda or daybooks.
   - **China (2000 B.C)**: Had sophisticated government accounting systems.
   - **India**: Kautilya's Arthashastra described how accounting records should be maintained.

2. **Medieval Period**:
   - **Luca Pacioli (1494)**: Published "Summa de Arithmetica, Geometria, Propor

In [None]:
# testing 6
query="What are the functions of Accounting?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

The main functions of accounting are as follows:

1. **Measurement**: Accounting measures past performance of the business entity and depicts its current financial position.

2. **Forecasting**: Accounting helps in forecasting future performance and financial position of the enterprise using past data and analyzing trends.

3. **Decision-making**: Accounting provides relevant information to the users of accounts to aid rational decision-making.

4. **Comparison & Evaluation**: Accounting assesses performance achieved in relation to targets and discloses information regarding accounting policies and contingent liabilities, which play an important role in predicting, comparing, and evaluating the financial results.

5. **Control**: Accounting identifies weaknesses in the operational system and provides feedback regarding the effectiveness of measures adopted to check such weaknesses.

6. **Government Regulation and Taxation**: Accounting provides necessary information to the government t

In [None]:
# testing 7
query="What are the considerations in determining capital and revenue expenditures?"
answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

The considerations in determining capital and revenue expenditures include:

1. **Nature of Business**: The classification of an expenditure as capital or revenue can depend on the nature of the business. For example, for a trader dealing in furniture, the purchase of furniture is a revenue expenditure, but for other types of businesses, it would be considered a capital expenditure and shown as an asset on the balance sheet.

2. **Recurring Nature of Expenditure**: If an expense occurs frequently within an accounting year, it is typically considered a revenue expenditure. Examples include monthly salaries or rent. Conversely, nonrecurring expenditures, such as the purchase of assets, are generally classified as capital expenditures unless materiality criteria define them otherwise.

3. **Purpose of Expenses**: Expenses incurred for normal maintenance and repairs of an asset are usually revenue in nature. However, expenditures that significantly enhance the asset's productive capacity a

## Ask Questions in Loop

In [None]:
# import time
# i=1
# print("Type 'Quit' or 'Exit' to quit.")

# while True:
#   query=input(f"Question #{i}: ")
#   i=i+1
#   if query.lower in ["quit", "exit"]:
#     print("Quitting...")
#     time.sleep(2)
#     break

#   answer=ask_and_get_answer(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
#   print(f"Answer:\n{answer}")
#   print(f"\n{'='*50}")

## Update1

In [None]:
def ask_and_get_answer2(vector_store, query, openai_api_key=OPENAI_API_KEY):
    from langchain_openai import ChatOpenAI
    from langchain.chains import RetrievalQA
    from langchain.prompts import PromptTemplate
    from langchain.chains import LLMChain
    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2,
        api_key=openai_api_key,
    )
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

    custom_prompt_template = """
    You are an AI assistant tasked with answering questions based solely on the provided context.

    Context: {context}

    Human: {question}

    AI: Only answer the user's query using the information from the given context or documents. If the query is unrelated to the provided context, respond with "I don't know the answer as it's not related to the information I have." Do not use any external knowledge or make assumptions beyond what's explicitly stated in the context.

    """

    CUSTOM_PROMPT = PromptTemplate(
        template=custom_prompt_template,
        input_variables=["context", "question"]
    )

    # Create the custom chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": CUSTOM_PROMPT}
    )

    result = qa_chain({"query": query})
    return result["result"]

In [None]:
# testing 1
query="What are the procedural aspects of accounting?"
answer = ask_and_get_answer2(vector_store, query, OPENAI_API_KEY)
print(answer)

The procedural aspects of accounting, based on the provided context, can be divided into two main parts:

1. **Generating Financial Information:**
   - **Recording:** This involves documenting all business transactions of a financial nature in the books of account, such as the Journal or subsidiary books. These records are then transferred to secondary books like the Ledger, where transactions are classified and summarized.
   - **Classification and Summarization:** Transactions are classified in terms of income, expense, assets, and liabilities and summarized in profit and loss accounts and balance sheets.
   - **Measurement:** Transactions and events are measured in terms of money, using the ruling currency of the country.
   - **Interpretation:** The recorded, classified, and summarized transactions and events are interpreted to provide meaningful insights.

2. **Using the Financial Information:**
   - **Decision-making:** Providing relevant information to aid rational decision-maki

In [None]:
# testing 2
query="Explain the procedural aspects of accounting?"
answer = ask_and_get_answer2(vector_store, query, OPENAI_API_KEY)
print(answer)

The procedural aspects of accounting can be divided into two main parts: generating financial information and using the financial information.

### Generating Financial Information

1. **Recording**: This is the basic function of accounting. All business transactions of a financial character are recorded in the books of account, typically in a book called "Journal," which may be divided into several subsidiary books according to the nature and size of the business. This recording is done in primary books, and transactions are later transferred to secondary books, i.e., Ledger, where they are classified and summarized.

2. **Classifying and Summarizing**: Transactions and events are classified in terms of income, expense, assets, and liabilities according to their characteristics and summarized in profit and loss accounts and balance sheets.

3. **Measuring**: Transactions and events are measured in terms of money, using the ruling currency of the country (e.g., rupee in India, dollar i

In [None]:
# testing 3
query="Tell me about Issac Newton."
answer=ask_and_get_answer2(vector_store, query=query, openai_api_key=OPENAI_API_KEY)
print(answer)

I don't know the answer as it's not related to the information I have.


## Update2

In [None]:
def ask_and_get_answer2(vector_store, query, openai_api_key=OPENAI_API_KEY):
    from langchain_openai import ChatOpenAI
    from langchain.chains import RetrievalQA
    from langchain.prompts import PromptTemplate
    from langchain.chains import LLMChain
    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2,
        api_key=openai_api_key,
    )
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

    custom_prompt_template = """
    You are an AI assistant tasked with answering questions based solely on the provided context.

    Context: {context}

    Human: {question}

    AI: Only answer the user's query using the information from the given context or documents. If the query is unrelated to the provided context, respond with "I don't know the answer as it's not related to the information I have." Do not use any external knowledge or make assumptions beyond what's explicitly stated in the context.

    """

    CUSTOM_PROMPT = PromptTemplate(
        template=custom_prompt_template,
        input_variables=["context", "question"]
    )

    # Create the custom chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": CUSTOM_PROMPT}
    )

    result = qa_chain({"query": query})
    return result["result"]

In [None]:
# Testing1
query="What are the procedural aspects of accounting?"
answer = ask_and_get_answer3(vector_store, query, OPENAI_API_KEY)
print(answer)

Here are the relevant excerpts from the document:

Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:

"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."

In 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:

"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."

Thus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results thereof to the persons 

# Advanced Retrieval

### Load Document

In [None]:
# loading the document
pages = load_document("/content/accounting.docx")

Detecting file type ...
docx file type detected!
Loading /content/accounting.docx ...
203 pages parsed successfully!
Done!


In [None]:
pages[1]

Document(metadata={'source': '/content/accounting.docx'}, page_content='Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:\n\n"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."\n\nIn 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:\n\n"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."\n\nThus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the re

In [None]:
pages[1].page_content

'Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:\n\n"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."\n\nIn 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:\n\n"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."\n\nThus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results thereof to the persons interested in such information.\n\nThe abo

### Pretty Print Docs

In [None]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

###  OpenaAI Embeddings


In [None]:
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=OPENAI_API_KEY,
    chunk_size=1,
  )

## ParentDocumentRetriever

In [None]:
!pip install langchain_core -q

In [None]:
import os
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from pinecone import Pinecone
from langchain_core.stores import InMemoryStore
from langchain.retrievers import ParentDocumentRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter

def upstream_embeddings(documents, index_name, openai_api_key=OPENAI_API_KEY, pinecone_api_key=PINECONE_API_KEY):
    # Set the Pinecone API key as an environment variable
    os.environ['PINECONE_API_KEY'] = pinecone_api_key
    pc = Pinecone(api_key=pinecone_api_key)

    # Initialize OpenAI embeddings
    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-large",
        openai_api_key=openai_api_key
    )

    # Initialize Pinecone vector store
    vector_store = PineconeVectorStore.from_existing_index(index_name, embeddings)

    # Initialize text splitters
    parent_splitter = RecursiveCharacterTextSplitter(chunk_size=960, chunk_overlap=160)
    child_splitter = RecursiveCharacterTextSplitter(chunk_size=480, chunk_overlap=80)

    # Initialize in-memory store for parent documents
    store = InMemoryStore()

    # Initialize ParentDocumentRetriever
    retriever = ParentDocumentRetriever(
        vectorstore=vector_store,
        docstore=store,
        child_splitter=child_splitter,
        parent_splitter=parent_splitter,
    )

    # Add documents to the retriever
    retriever.add_documents(documents)

    print("Done filling up the Pinecone Vector store and InMemoryStore with the Embeddings (Knowledge)")
    return retriever

In [None]:
%%time
retriever = upstream_embeddings(documents=chunks, index_name="subrata-rag-qa")

Done filling up the Pinecone Vector store and InMemoryStore with the Embeddings (Knowledge)


In [None]:
retriever

ParentDocumentRetriever(vectorstore=<langchain_pinecone.vectorstores.PineconeVectorStore object at 0x7937a714d330>, docstore=<langchain_core.stores.InMemoryStore object at 0x7937a714c490>, child_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7937a714d0f0>, parent_splitter=<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x7937a714d270>)

### Similarity Search

In [None]:
query="Explain the procedural aspects of accounting?"
result=retriever.get_relevant_documents(query)
print(result)

[Document(metadata={'source': '/content/accounting.docx', 'text': 'Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:\n\n"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."\n\nIn 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:\n\n"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."\n\nThus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results

In [None]:
print(result[0].page_content)

DISTINCTION BETWEEN BOOK-KEEPING AND ACCOUNTING

Some people mistake book-keeping and accounting to be synonymous terms, but in fact they are different from each other. Accounting is a broad subject. It calls for a greater understanding of records obtained from book-keeping and an ability to analyse and interpret the information provided by book-keeping records. Book-keeping is the recording phase while accounting is
concerned with the summarising phase of an accounting system. Book-keeping provides necessary data for accounting and accounting starts where book-keeping ends.

Top of Form

Book-keeping: This is a process primarily concerned with the systematic recording of financial transactions.

Accounting: This process entails summarizing and interpreting the transactions that have been recorded during book-keeping.

Book-keeping: It lays the foundation for the accounting process.


## MultiQuery Retriever

In [None]:
%%time
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.retrievers.multi_query import MultiQueryRetriever

llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key=OPENAI_API_KEY
)

mq_retriever=MultiQueryRetriever.from_llm(
    retriever=retriever,
    llm=llm
)

CPU times: user 127 ms, sys: 2.87 ms, total: 130 ms
Wall time: 128 ms


In [None]:
query="Explain the procedural aspects of accounting?"
result=mq_retriever.get_relevant_documents(query)
print(len(result))
print(result)

6
[Document(metadata={'source': '/content/accounting.docx', 'text': 'DISTINCTION BETWEEN BOOK-KEEPING AND ACCOUNTING\n\nSome people mistake book-keeping and accounting to be synonymous terms, but in fact they are different from each other. Accounting is a broad subject. It calls for a greater understanding of records obtained from book-keeping and an ability to analyse and interpret the information provided by book-keeping records. Book-keeping is the recording phase while accounting is\nconcerned with the summarising phase of an accounting system. Book-keeping provides necessary data for accounting and accounting starts where book-keeping ends.\n\nTop of Form\n\nBook-keeping: This is a process primarily concerned with the systematic recording of financial transactions.\n\nAccounting: This process entails summarizing and interpreting the transactions that have been recorded during book-keeping.\n\nBook-keeping: It lays the foundation for the accounting process.\n\nAccounting: It acts a

In [None]:
print(result[0].page_content)

DISTINCTION BETWEEN BOOK-KEEPING AND ACCOUNTING

Some people mistake book-keeping and accounting to be synonymous terms, but in fact they are different from each other. Accounting is a broad subject. It calls for a greater understanding of records obtained from book-keeping and an ability to analyse and interpret the information provided by book-keeping records. Book-keeping is the recording phase while accounting is
concerned with the summarising phase of an accounting system. Book-keeping provides necessary data for accounting and accounting starts where book-keeping ends.

Top of Form

Book-keeping: This is a process primarily concerned with the systematic recording of financial transactions.

Accounting: This process entails summarizing and interpreting the transactions that have been recorded during book-keeping.

Book-keeping: It lays the foundation for the accounting process.


## Contextual Compression

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

compressor=LLMChainExtractor.from_llm(llm)
compression_retriever=ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

In [None]:
%%time
compressed_docs=compression_retriever.get_relevant_documents(query)
print(len(compressed_docs))
print(compressed_docs)

3
[Document(metadata={'source': '/content/accounting.docx', 'text': "Decision-making: Accounting provides relevant information to the users of accounts to aid rational decision-making.\n\nComparison & Evaluation: Accounting assesses performance achieved in relation to targets and discloses information regarding accounting policies and contingent liabilities which play an important role in predicting, comparing and evaluating the financial results.\n\nControl: Accounting also identifies weaknesses of the operational system and provides feedbacks regarding effectiveness of measures adopted to check such weaknesses.\n\nGovernment Regulation and Taxation: Accounting provides necessary information to the government to exercise control on the entity as well as in collection of tax revenues.\n\nBOOK-KEEPING\n\nBook-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accountin

In [None]:
print(compressed_docs[0])

page_content='BOOK-KEEPING

Book-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accounting work and embraces record keeping function. Obviously, book-keeping procedures are governed by the end product, the financial statements. The term 'financial statements' means Profit and Loss Account, Balance Sheet and cash flow statements including Schedules and Notes forming part of Accounts.

Book-keeping also requires suitable classification of transactions and events. This is also determined with reference to the requirement of financial statements. A book-keeper may be responsible for keeping all the records of a business or only of a minor segment, such as position of the customers' accounts in a departmental store. Accounting is based on a careful and efficient book-keeping system.' metadata={'source': '/content/accounting.docx', 'text': "Decision-making: Accounting p

In [None]:
pretty_print_docs(compressed_docs)

Document 1:

BOOK-KEEPING

Book-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accounting work and embraces record keeping function. Obviously, book-keeping procedures are governed by the end product, the financial statements. The term 'financial statements' means Profit and Loss Account, Balance Sheet and cash flow statements including Schedules and Notes forming part of Accounts.

Book-keeping also requires suitable classification of transactions and events. This is also determined with reference to the requirement of financial statements. A book-keeper may be responsible for keeping all the records of a business or only of a minor segment, such as position of the customers' accounts in a departmental store. Accounting is based on a careful and efficient book-keeping system.
----------------------------------------------------------------------------------------

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import embeddings_filter

embeddings_filter=embeddings_filter.EmbeddingsFilter(
    embeddings=embeddings,
    similarity_threshold=0.6
)
compression_retriever=ContextualCompressionRetriever(
    base_compressor=embeddings_filter,
    base_retriever=retriever
)

In [None]:
query="Explain the procedural aspects of accounting?"
compression_doc=compression_retriever.get_relevant_documents(query)
print(len(compression_doc))
print(compression_doc)

1
[_DocumentWithState(metadata={'source': '/content/accounting.docx', 'text': 'Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:\n\n"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."\n\nIn 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:\n\n"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."\n\nThus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating

In [None]:
compression_doc[0].page_content

'Thus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results thereof to the persons interested in such information.\n\nThe above definition requires accountants to assume a bigger responsibility than to merely do book-keeping. Accountants need to be ready to provide the information ready for the intended users to be able to make economic decisions.\n\n\n\nProcedural aspects of Accounting\n\n\n\nOn the basis of the above definitions, procedure of accounting can be basically divided into two parts:\n\nGenerating financial information and\n\nUsing the financial information.\n\nGenerating Financial Information'

In [None]:
from langchain_community.document_transformers.embeddings_redundant_filter import EmbeddingsRedundantFilter
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.retrievers.document_compressors.embeddings_filter import EmbeddingsFilter
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=OPENAI_API_KEY,
    chunk_size=1,
  )

splitter=RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
redundant_filter=EmbeddingsRedundantFilter(
    embeddings=embeddings,
    query_splitter=splitter
)
relevant_filter=EmbeddingsFilter(
    embeddings=embeddings,
    similarity_threshold=0.5
)
pipeline_compressor=DocumentCompressorPipeline(
    transformers=[redundant_filter, relevant_filter]
)
compression_retriever=ContextualCompressionRetriever(
    base_compressor=pipeline_compressor,
    base_retriever=retriever
)

In [None]:
query="Explain the procedural aspects of accounting?"
compressed_docs=compression_retriever.get_relevant_documents(query)
print(len(compressed_docs))
print(compressed_docs)

4
[_DocumentWithState(metadata={'source': '/content/accounting.docx', 'text': 'Therefore, this requirement of communicating and motivating informed judgement has also become the part of accounting as defined in the widely accepted definition of accounting, given by the American Accounting Association in 1966 which treated accounting as:\n\n"The process of identifying, measuring and communicating economic information to permit informed judgments and decisions by the users of accounts."\n\nIn 1970, the Accounting Principles Board (APB) of American Institute of Certified Public Accountants (AICPA) enumerated the functions of accounting as follows:\n\n"The function of accounting is to provide quantitative information, primarily of financial nature, about economic entities, that is needed to be useful in making economic decisions."\n\nThus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating

In [None]:
pretty_print_docs(compressed_docs)

Document 1:

Thus, accounting may be defined as the process of recording, classifying, summarising, analysing and interpreting the financial transactions and communicating the results thereof to the persons interested in such information.

The above definition requires accountants to assume a bigger responsibility than to merely do book-keeping. Accountants need to be ready to provide the information ready for the intended users to be able to make economic decisions.



Procedural aspects of Accounting



On the basis of the above definitions, procedure of accounting can be basically divided into two parts:

Generating financial information and

Using the financial information.

Generating Financial Information
----------------------------------------------------------------------------------------------------
Document 2:

As per this definition, accounting is simply an art of record keeping. The process of accounting starts by first identifying the events and transactions which are of

## Ensemble Retriever
**Combining multiple retrieval algorithm to get the most relevant documents.**

In [None]:
!pip install rank_bm25 -qU

In [None]:
%%time
from langchain.retrievers.ensemble import EnsembleRetriever
from langchain.retrievers import BM25Retriever

bm25_retriever=BM25Retriever.from_documents(documents=chunks)
bm25_retriever.k=2

chunks = chunk_data(data=pages, chunk_size=500)
delete_pinecone_indexes()
index=create_index(index_name="subrata-rag-qa", pinecone_api_key=PINECONE_API_KEY)
vector_store=upstream_embeddings(chunks=chunks, index_name="subrata-rag-qa", openai_api_key=OPENAI_API_KEY)
retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

ensemble_retriever=EnsembleRetriever(
    retrievers=[bm25_retriever, retriever],
    weights=[0.5, 0.5]
)

Chunking ...
Chunk size 2184.
Done!
Deleting all indexes ...
Deleting index: subrata-rag-qa ...
Creating index subrata-rag-qa ...
Done filling up the Pinecone Vector store with the Embeddings (Knowledge)
CPU times: user 43.3 s, sys: 604 ms, total: 43.9 s
Wall time: 1min 2s


In [None]:
query="Explain the procedural aspects of accounting?"
result=ensemble_retriever.get_relevant_documents(query)
pretty_print_docs(result)

Document 1:

Book-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accounting work and embraces record keeping function. Obviously, book-keeping procedures are governed by the end product, the financial statements. The term 'financial statements' means Profit and Loss Account, Balance Sheet and cash flow statements including Schedules and Notes forming part of Accounts.

Book-keeping also requires suitable classification of transactions and events. This is also determined with reference to the requirement of financial statements. A book-keeper may be responsible for keeping all the records of a business or only of a minor segment, such as position of the customers' accounts in a departmental store. Accounting is based on a careful and efficient book-keeping system.
----------------------------------------------------------------------------------------------------
D

In [None]:
print(len(result))
print(result)

6
[Document(metadata={'source': '/content/accounting.docx', 'text': "Book-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accounting work and embraces record keeping function. Obviously, book-keeping procedures are governed by the end product, the financial statements. The term 'financial statements' means Profit and Loss Account, Balance Sheet and cash flow statements including Schedules and Notes forming part of Accounts.\n\nBook-keeping also requires suitable classification of transactions and events. This is also determined with reference to the requirement of financial statements. A book-keeper may be responsible for keeping all the records of a business or only of a minor segment, such as position of the customers' accounts in a departmental store. Accounting is based on a careful and efficient book-keeping system."}, page_content="Book-keeping is an activity

Document 1:

Book-keeping is an activity concerned with the recording of financial data relating to business operations in a significant and orderly manner. It covers procedural aspects of accounting work and embraces record keeping function. Obviously, book-keeping procedures are governed by the end product, the financial statements. The term 'financial statements' means Profit and Loss Account, Balance Sheet and cash flow statements including Schedules and Notes forming part of Accounts.

Book-keeping also requires suitable classification of transactions and events. This is also determined with reference to the requirement of financial statements. A book-keeper may be responsible for keeping all the records of a business or only of a minor segment, such as position of the customers' accounts in a departmental store. Accounting is based on a careful and efficient book-keeping system.
----------------------------------------------------------------------------------------------------
D

## Optimized Code

In [None]:
%%time
from langchain.retrievers.ensemble import EnsembleRetriever
from langchain.retrievers import BM25Retriever
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Your existing code
bm25_retriever = BM25Retriever.from_documents(documents=chunks)
bm25_retriever.k = 2

chunks = chunk_data(data=pages, chunk_size=1500)
delete_pinecone_indexes()
index = create_index(index_name="subrata-rag-qa", pinecone_api_key=PINECONE_API_KEY)
vector_store = upstream_embeddings(chunks=chunks, index_name="subrata-rag-qa", openai_api_key=OPENAI_API_KEY)
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, retriever],
    weights=[0.5, 0.5]
)

# Initialize the LLM
llm = ChatOpenAI(
    model="gpt-4",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key=OPENAI_API_KEY
)

Chunking ...
Chunk size 638.
Done!
Deleting all indexes ...
Deleting index: subrata-rag-qa ...
Creating index subrata-rag-qa ...
Done filling up the Pinecone Vector store with the Embeddings (Knowledge)
CPU times: user 15.2 s, sys: 163 ms, total: 15.4 s
Wall time: 28.8 s


In [None]:
%%time
# Create a prompt template for query optimization and answer refinement
optimize_prompt = PromptTemplate(
    input_variables=["query", "retrieved_docs"],
    template="""
    Given the following query and retrieved documents, please perform the following tasks:

    1. Analyze the query and the retrieved documents.
    2. If the query is irrelevant or out of scope for the provided documents, respond with: "I don't have information about this topic in the provided documents. The query appears to be out of scope for the available content."
    3. If the query is relevant, identify the most relevant section that directly answers the query. This section should start with a numbered heading (e.g., "2.1. Procedural aspects of Accounting").
    4. Extract the entire content of the identified section, including all subsections, up to but not including the next main numbered heading (e.g., "3. EVOLUTION OF ACCOUNTING AS A SOCIAL SCIENCE").
    5. Present the extracted content exactly as it appears in the document, without any modifications, summarizations, or additional information.
    6. Format the answer for better readability:
       - Use markdown formatting.
       - Preserve the original headings and subheadings.
       - Do not bold or emphasize any terms unless they are already formatted that way in the original text.
    7. If the query is relevant but doesn't match any specific numbered section, provide the most relevant information from the retrieved documents without extracting entire sections.
    8. Ensure that all information in the answer comes ONLY from the retrieved documents.
    9. Do not add any external information, personal knowledge, or explanations.
    10. If after analyzing the retrieved documents, you find that they don't contain relevant information to answer the query, respond with: "I don't have enough information in the provided documents to answer this query accurately."

    Query: {query}

    Retrieved Documents:
    {retrieved_docs}

    Extracted and Formatted Answer:
    """
)

# Create an LLMChain for query optimization and answer refinement
optimize_chain = LLMChain(llm=llm, prompt=optimize_prompt)

CPU times: user 646 µs, sys: 0 ns, total: 646 µs
Wall time: 3.41 ms


In [None]:
%%time
def optimized_retrieval_and_answer(query):
    retrieved_docs = ensemble_retriever.get_relevant_documents(query)
    formatted_docs = "\n\n".join([f"Document {i+1}:\n{doc.page_content}" for i, doc in enumerate(retrieved_docs)])

    # Use the LLM to optimize and format the answer
    optimized_response = optimize_chain.run(query=query, retrieved_docs=formatted_docs)

    return optimized_response

CPU times: user 8 µs, sys: 0 ns, total: 8 µs
Wall time: 11.9 µs


In [None]:
%%time
query="Explain the procedural aspects of accounting?"
answer = optimized_retrieval_and_answer(query)
print(answer)

**Procedural aspects of Accounting**

On the basis of the above definitions, procedure of accounting can be basically divided into two parts:

1. Generating financial information
2. Using the financial information.

**Generating Financial Information**

The first two procedural stages of the process of generating financial information along with the preparation of trial balance are covered under book-keeping while the preparation of financial statements and its analysis, interpretation and also its communication to the various users are considered as accounting stages. Students will learn the term book-keeping and its distinction with accounting, in the coming topics of this unit.

**Using the Financial Information**

As per this definition, accounting is simply an art of record keeping. The process of accounting starts by first identifying the events and transactions which are of financial character and then be recorded in the books of account. Continuing with the same example of the 

## Self-Querying Retrievers

* Not feasible as we have to manually set the metadata and attribute info for each document parsed by the Langchain Document Loader fro example in this case it's 638.



## Time Weighted Vector Store Retrievers