# Querying PDF With LangChain

#### Pre-requisites:

Request an OpenAI key if haven't already at: [OpenAI API Key](https://cassio.org/start_here/#llm-access) 

#### Next:
- Setup: import dependencies such as langchain_community, langchain-openai, and langchain_openai

In [30]:
!pip install -qU pypdf langchain_community
!pip install -qU langchain-openai
!pip install langchain_openai


[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.

In [50]:
from langchain_community.document_loaders import PyPDFLoader

# provide the path of the Trust Document PDF file
file_path = r'C:\Users\wangs\OneDrive\Desktop\TRUST_DOC.pdf'
loader = PyPDFLoader(file_path)
docs = loader.load()

In [57]:
from pypdf import PdfReader
pdfreader = PdfReader(r'C:\Users\wangs\OneDrive\Desktop\TRUST_DOC.pdf')
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

raw_text   

num_of_pages = len(pdfreader.pages)
num_of_words = len(raw_text)
num_new_lines = raw_text.count("\n")
period_counts = raw_text.count(".")
print("Number of pages:", num_of_pages, "\n"
      "Number of words:", num_of_words, "\n"
      "Number of new lines:", num_new_lines, "\n"
      "Number of periods:" ,period_counts)

Number of pages: 16 
Number of words: 31651 
Number of new lines: 482 
Number of periods: 194


In [41]:
#Sample some content of the document
print(docs[13].page_content)
print(docs[1].metadata)

 
 
8. ALIENATION.   No disposition of, or charge or encumbrance on, the income or 
principal of the trust or any part thereof by any beneficiary under this Declaration, by way of 
anticipation, shall be valid or in any way binding upon the Trustee, and no beneficiary shall have the 
right to assign, transfer, encumber or otherwise dispose of such income or principal or any part thereof 
until the same shall be paid or distributed to such beneficiary by the Trustee.  No income or principal 
or any part thereof shall in any way be liable to any claim of any creditor of any such beneficiary.  No 
court shall order payment of trust property pursuant to New York Estate Powers and Trust Law (EPTL) 
§7-1.6 or otherwise.  It is the intent of this Decl aration that only the Trustee shall determine when, and 
in what amounts principal or income shall be paid. 
 
9. CHANGES BY WILL. This trust may not be amended or revoked by the provisions of 
any will or codicil of Grantor pursuant to EPTL §7-

### Select Model to Use

In [6]:
import getpass
import os

if any(['VSCODE' in x for x in os.environ.keys()]):
       print('Please enter password in the prompt above')
os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

Please enter password in the prompt above


In [43]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# We need to split the text using Character Text Split such that it should not increase token size
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, #need to encompass words so that it doesn't lose sentence completeness/contextualization
                                               chunk_overlap=200, 
                                               length_function = len)
splits = text_splitter.split_documents(docs)


### Load the dataset into the vector store

In [None]:
vectorstore = InMemoryVectorStore.from_documents(
    documents=splits, embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

### Run the QA cycle

Simply run the cells and ask a question -- or `quit` to stop. (you can also stop execution with the "▪" button on the top toolbar)

Here are some suggested questions:

In [45]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. "
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

results = rag_chain.invoke({"input": "I am a Trust Officer. John Client is no more. Donna Client has requested $800,000 in order to purchase a new home. Should I approve this distribution? Can you read the document and help me make the decision according to the clauses of the Trust?"})

results

{'input': 'I am a Trust Officer. John Client is no more. Donna Client has request $800,000 in order to purchase a new home. Should I approve this distribution? Can you read the document and help me make the decision according to the clauses of the Trust?',
 'context': [Document(id='54330022-55aa-4b7b-a0fe-79eef9afe52f', metadata={'source': 'C:\\Users\\wangs\\OneDrive\\Desktop\\TRUST_DOC.pdf', 'page': 2}, page_content='DECLARATION OF TRUST 1\nJOHN CLIENT TRUST 2 \nTHIS DECLARATION, made the _______ day of November, 2015 by JOHN H. \nCLIENT, of 123 Main St., Syracuse, NY 13202 (hereinafter referred to as "Grantor" and "Trustee"); \nW I T N E S S E T H : \n1. TRUST PROPERTY.   The Grantor has this day delivered the property described in\nSchedule "A", attached hereto, to the Trustee and does hereby transfer ownership of such property.3\nThe Trustee agrees to act as Trustee of such assets and to hold, administer and distribute the \nproperty, together with all additions thereto and all rei

## Similarity Search With Score

While-loop for prompting more questions. 
Example questions: 
''' 
Given the situation that a client has recently set up a revocable living trust and wants to ensure that the trust will adequately provide
for thier minor grandchildren if they pass away unexpectedly. They are particularly concerned about how the funds will be managed 
until their grandchildren reach adulthood. 

The trust officer's question is: I have a client, John H. Client, who is concerned about the provisions in his trust for his minor 
nieces and nephews. He wants to ensure that if he passes away, the trust will manage the funds responsibly for their benefit until 
they reach adult. Specifically, he is asking who will be responsible for managing the funds, what the funds can be used for, and when 
the minors will have access to their inheritance. Can you provide details on how the trust addresses these concerns? 
'''

'''
A client is concerned about how their revocable trust will handle distributions to beneficiaries if the primary beneficiary, who is
currently healthy, later becomes incapacitated. Can you provide details on how the trust addresses these concerns?
'''

In [60]:
first_question = True
while True:
    if first_question:
        query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
    else:
        query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

    if query_text.lower() == "quit":
        break

    if query_text == "":
        continue

    first_question = False

    print("\nQUESTION: \"%s\"" % query_text)
    answer = rag_chain.invoke({"input": query_text})
    print("ANSWER: \"%s\"\n" % answer)

    print("FIRST DOCUMENTS BY RELEVANCE:")
    for doc, score in vectorstore.similarity_search_with_score(query_text, k=3):
        print("    [%0.4f] \"%s ...\"" % (score, doc.page_content[:100]))



QUESTION: "I am a Trust Officer. John Client is no more. Donna Client has request $800,000 in order to purchase a new home. Should I approve this distribution? Can you read the document and help me make the decision according to the clauses of the Trust?"
ANSWER: "{'input': 'I am a Trust Officer. John Client is no more. Donna Client has request $800,000 in order to purchase a new home. Should I approve this distribution? Can you read the document and help me make the decision according to the clauses of the Trust?', 'context': [Document(id='54330022-55aa-4b7b-a0fe-79eef9afe52f', metadata={'source': 'C:\\Users\\wangs\\OneDrive\\Desktop\\TRUST_DOC.pdf', 'page': 2}, page_content='DECLARATION OF TRUST 1\nJOHN CLIENT TRUST 2 \nTHIS DECLARATION, made the _______ day of November, 2015 by JOHN H. \nCLIENT, of 123 Main St., Syracuse, NY 13202 (hereinafter referred to as "Grantor" and "Trustee"); \nW I T N E S S E T H : \n1. TRUST PROPERTY.   The Grantor has this day delivered the property desc

In [61]:
answer

{'input': 'I am a Trust Officer. John Client is no more. Donna Client has request $800,000 in order to purchase a new home. Should I approve this distribution? Can you read the document and help me make the decision according to the clauses of the Trust?',
 'context': [Document(id='54330022-55aa-4b7b-a0fe-79eef9afe52f', metadata={'source': 'C:\\Users\\wangs\\OneDrive\\Desktop\\TRUST_DOC.pdf', 'page': 2}, page_content='DECLARATION OF TRUST 1\nJOHN CLIENT TRUST 2 \nTHIS DECLARATION, made the _______ day of November, 2015 by JOHN H. \nCLIENT, of 123 Main St., Syracuse, NY 13202 (hereinafter referred to as "Grantor" and "Trustee"); \nW I T N E S S E T H : \n1. TRUST PROPERTY.   The Grantor has this day delivered the property described in\nSchedule "A", attached hereto, to the Trustee and does hereby transfer ownership of such property.3\nThe Trustee agrees to act as Trustee of such assets and to hold, administer and distribute the \nproperty, together with all additions thereto and all rei

In [58]:
print(results["context"][0].page_content)
print(results["context"][0].metadata)

DECLARATION OF TRUST 1
JOHN CLIENT TRUST 2 
THIS DECLARATION, made the _______ day of November, 2015 by JOHN H. 
CLIENT, of 123 Main St., Syracuse, NY 13202 (hereinafter referred to as "Grantor" and "Trustee"); 
W I T N E S S E T H : 
1. TRUST PROPERTY.   The Grantor has this day delivered the property described in
Schedule "A", attached hereto, to the Trustee and does hereby transfer ownership of such property.3
The Trustee agrees to act as Trustee of such assets and to hold, administer and distribute the 
property, together with all additions thereto and all reinvestments thereof, as the principal of a trust 
estate for the benefit of Grantor in accordance with the terms and provision
s hereinafter set out. 
1 Since the abolishment of the merger doctrine, an individual may create a trust with his 
own assets and act as sole Trustee.  If the document establishing such an entity involves 
only one party, it would not be an agreement, but a declaration.  If any other party is
{'source':