# Contract Document Analyzer

## Install Dependencies

In [1]:
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


## Setup Files

### Import API Key

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Load Documents

In [3]:
from langchain.document_loaders import PyPDFLoader

pdfs_source = [os.path.join("contracts", f) for f in os.listdir("contracts") if f.endswith(".pdf")]

pdfs = []
for source in pdfs_source:
    pdf_loader = PyPDFLoader(source)
    pdf = pdf_loader.load()
    pdfs.append(pdf)

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 20 0 (offset 0)
Ignoring wrong pointing object 22 0 (offset 0)
Ignoring wrong pointing object 24 0 (offset 0)


In [4]:
print(pdfs[0][0].page_content)
print(pdfs[0][0].metadata)

  WEBSITE DEVELOPMENT CONTRACT FORM Client:   Contact:  Address: Phone:    E-Mail:   Services: [Name and briefly describe any additional services not in the form below]  Website Development Services (the “Service(s)”).      Service Fee:  $______________ per month, payable in advance, subject to the terms of Section 2.5 herein. Initial Service Term:  For subscription agreements – delete if not applicable [One] Year  Service Capacity: ___________________   [Note: include any limits on usage.]   Also, if additional fees will be required for overages, include details here or in fees section above]  Improvement or Optimization Services:  Company will use commercially reasonable efforts to provide Client the services described in the Statement of Work (“SOW”) attached as Exhibit A hereto (“Implementation Services”), and Client shall pay Company the Improvement Fee in accordance with the terms herein. Improvement Services Fee (one-time):  $____________   (Feel free to modify the form above to

## Preparing Data

### Chunking Documents

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n", "\n\n", "."],
    chunk_size=1000,
    chunk_overlap=200,
)

chunks = []

for pdf in pdfs:
    pdf_chunk = text_splitter.split_documents(pdf)
    chunks.extend(pdf_chunk)

In [6]:
print(chunks[1].page_content)
print(chunks[1].metadata)

. Improvement Services Fee (one-time):  $____________   (Feel free to modify the form above to match whatever you want to add into the contract below, remove this text)
{'source': 'contracts/MonthlySubscriptionContract.pdf', 'page': 0}


## Model

### Load Model and Embeddings

In [7]:
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.vectorstores import FAISS

# temperature 0 we dont want creativity with existing data
# Token Limit 1 million TPM (tokens per minute)
llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-1.5-flash")
max_tokens_per_min = 1e6

# Loading Gemini Embeddings and Embedding document chunks to vectors
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_documents(chunks, embedding=embeddings)

  from .autonotebook import tqdm as notebook_tqdm


### Prompt

In [8]:
from langchain.prompts import PromptTemplate

prompt_string = """
You are a contract assistant. Based on the following context, conversation history, please extract the information for the question below:
If the answer is not present in the provided context, respond with “I don't have sufficient information to address that question.”.
Never provide incorrect information. If it's a greeting respond nicely.

Context: {summaries}

Conversation History: {history}

Query: {question}
"""

prompt = PromptTemplate(
    template=prompt_string, input_variables=["summaries", "history", "question"]
)

### Implementing Question Answering Chain

In [9]:
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.memory import ConversationBufferWindowMemory

# chain = RetrievalQAWithSourcesChain.from_chain_type(
#     llm=llm,
#     chain_type="stuff",
#     retriever=vector_store.as_retriever(),
#     return_source_documents=True,
#     chain_type_kwargs={
#         "prompt": prompt,
#         "memory": ConversationBufferWindowMemory(
#             k=4,
#             memory_key="history",
#             input_key="question",
#         ),
#     },
# )
chain = RetrievalQAWithSourcesChain.from_llm(
    llm=llm,
    max_tokens_limit=int(max_tokens_per_min),
    reduce_k_below_max_tokens=True,
    retriever=vector_store.as_retriever(),
    memory= ConversationBufferWindowMemory(
            k=4,
            memory_key="history",
            input_key="question",
            output_key="answer"
        ),
)

  memory= ConversationBufferWindowMemory(


### Running Inference

In [10]:
import langchain

query = "Summarize Maintenance Agreement"

langchain.debug = True

# k is number of docs to return
# context = vector_store.similarity_search(query , k=4)
response = chain({"question": query}, return_only_outputs=True)

  response = chain({"question": query}, return_only_outputs=True)


[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "Summarize Maintenance Agreement",
  "history": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": ".  ARTICLE XIV. INSURANCE.  Client agrees to obtain and maintain during the term of this Agreement all insurance coverage necessary to guard again all risks of loss that may arise out of, or relating to, this Agreement, including business interruption insurance. ARTICLE XV. CAPTIONS.",
      "question": "Summarize Maintenance Agreement",
      "history": ""
    },
    {
      "context": ".  Monthly Service Fee shall be delivered  by  Client  to  Company  contemporaneously  w

In [11]:
print("Result:", response["answer"])
print("Sources: ", response["sources"])

Result: FINAL ANSWER: This agreement is between the Client and Meanbee for web development, design, support and consultancy services for a 6 month period. Meanbee agrees to devote a certain number of days per month on assignments to be determined by the Client. 

Sources:  contracts/MaintenanceAgreement.pdf
