#### Set the Open-AI Key

In [1]:
import os
from constants import openai_key

os.environ["OPENAI_API_KEY"] = openai_key

#### PDF Query Using Langchain

In [2]:
!pip install langchain
!pip install openai
!pip install PyPDF2    
!pip install faiss-cpu
!pip install tiktoken



In [3]:
from PyPDF2 import PdfReader    # TO read from the PDF files
from langchain.embeddings.openai import OpenAIEmbeddings   # measure the relatedness of text strings
from langchain.text_splitter import CharacterTextSplitter   # to split the text by considering some special characters
from langchain.vectorstores import FAISS  # to store the vectors 

In [4]:
# provide the path of pdf file/ files.
pdf_reader = PdfReader("data/budget_speech.pdf")

In [5]:
from typing_extensions import Concatenate

# read text from pdf
raw_text = ' '
for i, page in enumerate(pdf_reader.pages):
    content = page.extract_text()
    if content:
        raw_text += content 

In [6]:
# now, split the text using Character Text Split such that it should not increase token size
text_splitter = CharacterTextSplitter(
    separator='\n',
    chunk_size=800,
    chunk_overlap=200,
    length_function=len,
)

texts = text_splitter.split_text(raw_text)

In [7]:
len(texts)

160

In [8]:
## download embeddings from OpenAI
embeddings = OpenAIEmbeddings()

  embeddings = OpenAIEmbeddings()


In [9]:
# put the `text` into `embeddings` and get the entire vector

document_search = FAISS.from_texts(texts, embeddings)

In [10]:
document_search

<langchain_community.vectorstores.faiss.FAISS at 0x73f43b5b6e70>

In [11]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [12]:
chain = load_qa_chain(OpenAI(), chain_type='stuff')

  chain = load_qa_chain(OpenAI(), chain_type='stuff')
stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(OpenAI(), chain_type='stuff')


In [26]:
query = "How much the agriculture target will be"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

' There is no specific mention of a target for agriculture in the given context. However, the government has announced various initiatives and policies aimed at increasing productivity, resilience, and self-sufficiency in the agriculture sector.'

In [25]:
query = "Productivity and resilience in Agriculture"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

' The government plans to undertake a comprehensive review of the agriculture research setup to focus on raising productivity and developing climate resilient varieties. They will also release new high-yielding and climate-resilient varieties of crops for cultivation and promote natural farming practices. Additionally, they aim to strengthen the production, storage, and marketing of pulses and oilseeds, as well as develop large scale clusters for vegetable production. They will also facilitate the implementation of a Digital Public Infrastructure for Agriculture in partnership with states.'

In [27]:
query = "how much for agriculture and allied sector"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)


' The provision for agriculture and allied sector is ` 1.52 lakh crore.'

In [23]:
query = "Vikas bhi Virasat bhi"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)

' Vikas bhi Virasat bhi is a phrase used in the context of the Indian government\'s development plans for the future. It means "development with heritage" and emphasizes the importance of preserving and celebrating India\'s cultural and historical heritage while also promoting economic growth and development. '

In [22]:
query = "Employment and Investment"
docs = document_search.similarity_search(query)
chain.run(input_documents=docs, question=query)


' \nThere are multiple pieces of context provided in the given information related to employment and investment. The government plans to implement schemes for employment linked incentives and will focus on recognition of first-time employees. The use of technology has been successful in improving productivity and bridging inequality in the economy. The government also plans to encourage private investment in infrastructure through various means such as viability gap funding and a market-based financing framework. In addition, there are plans to launch Phase IV of the Pradhan Mantri Gram Sadak Yojana (PMGSY) to provide all-weather connectivity to rural habitations. The government also plans to sanction industrial parks and facilitate rental housing in PPP mode for industrial workers. Furthermore, there will be ownership, leasing and flagging reforms in the shipping industry to generate more employment. Lastly, a Critical Mineral Mission will be set up to boost domestic production and ac