## Document Chains Demo

Document Chains allow you to process and analyze large amounts of text data efficiently. They provide a structured approach to working with documents, enabling you to retrieve, filter, refine, and rank them based on specific criteria.<br><br>
By using different types of Document Chains like Stuff, Refine, Map Reduce, or Map Re-rank, you can perform specific operations on the retrieved documents and obtain more accurate and relevant results.

In [41]:
import os
import getpass
import textwrap

from langchain import PromptTemplate, LLMChain
from langchain_openai import OpenAI
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

from dotenv import load_dotenv

In [43]:
load_dotenv()

True

In [45]:
model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.5)

### Stuff Chain
This involves putting all relevant data into the Prompt for LangChain’s StuffDocumentsChain to process.
The advantage of this method is that it only requires one call to the LLM, and the model has access to all the information at once.

In [48]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Software-Engineer-CV.pdf")
docs = loader.load()

In [50]:
cnt = 0
for doc in docs:
    cnt = cnt+1
    print("---- Document #", cnt)
    print(doc.page_content.strip())

---- Document # 1
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 years  of experience in Software Development in C on  Linux Environment . 
• Over 5 years of programming  experience as an Oracle PL/SQL  developer in 
Analysis, Design and Implementation of business application using Oracle DBMS

In [52]:
prompt_template ="""
You are given a Resume as the below text. 
-----
{text}
-----
Question: Please respond with the Key Skills and Experience summary of the person. 
Key Skills:
Esxperience Summary: 
"""

In [60]:
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

stuff_chain = load_summarize_chain(model,
                             chain_type="stuff",
                             prompt=prompt)

output_summary = stuff_chain.invoke(docs)

In [82]:
print(output_summary["output_text"])


Key Skills: 
1. Software Development 
2. Oracle PL/SQL 
3. Linux Environment 
4. Database Management 
5. Programming Languages: C, Pro C, Shell scripting 
6. Version Control: GIT, TFS, CVS 
7. Tools: PL/SQL developer, JIRA, Confluence, Visual studio, GDB, Mercurial, Spirent Test Centre (STC), Wireshark 
8. Leadership and Team Collaboration 

Experience Summary: 
1. 12 years of experience in Software Development 
2. 5 years of experience as an Oracle PL/SQL developer 
3. Expertise in all stages of Software Development Life Cycle 
4. Experience with Table functions, indexes, Table partitioning, Collections, Analytical functions, and materialized views 
5. Proficient in creating tables, views, constraints, and indexes 
6. Skilled in developing complex DB objects like packages, procedures, functions, and triggers using PL/SQL 
7. Familiarity with Oracle-supplied packages, Dynamic SQL, records, and tables 
8. Experience with SQL Loader for data loading 
9. Knowledge of Oracle performance-r

## Refine Chain
The Refine Documents Chain uses an iterative process to generate a response by analyzing each input document and updating its answer accordingly.<br>

It passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to obtain a new answer for each document.<br>

This chain is ideal for tasks that involve analyzing more documents than can fit in the model’s context, as it only passes a single document to the LLM at a time.

In [85]:
refine_chain = load_summarize_chain(model, chain_type="refine")
print(refine_chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary.
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary.
If the context isn't useful, return the original summary.


In [87]:
output_summary = refine_chain.run(docs)
output_summary

"\n\nSunil Sharma is an experienced Senior Technical Lead with 12 years of experience in software development. He has a proven track record of successfully leading cross-functional teams and delivering projects on time and within budget. Sunil has a Bachelor's degree in Engineering and is skilled in C programming on Linux environment. He also has over 5 years of experience as an Oracle PL/SQL developer and is proficient in all stages of the software development life cycle. Sunil is knowledgeable in Oracle performance-related features and has expertise in creating complex database objects. He has held positions as a Technical Lead at Nokia Networks and a Senior Engineer at Plintron Global Technology Solutions, where he gained experience in leadership and team collaboration. Sunil's skills include PL/SQL, C, Pro C, and shell scripting, as well as experience with version control tools like GIT, TFS, and CVS. He has also worked with various software tools such as JIRA, Confluence, and Visu

## Map-Reduce Chain
To process large amounts of data efficiently, the MapReduceDocumentsChain method is used.<br>
This involves applying an LLM chain to each document individually (in the Map step), producing a new document. Then, all the new documents are passed to a separate combine documents chain to get a single output (in the Reduce step). If necessary, the mapped documents can be compressed before passing them to the combine documents chain.<br>
This compression step is performed recursively.

In [89]:
map_reduce_chain = load_summarize_chain(model,
                                        chain_type="map_reduce",
                                        verbose=True)

In [90]:
print(map_reduce_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [91]:
# just using the first 20 chunks as I don't want to run too long
output_summary = map_reduce_chain.run(docs)

print(output_summary)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"        
                                                 
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 year