# Document Chains

<b>Document Chains</b> allows you to efficiently process and analyze large amounts of text data. They provide a structured approach to working with documents, enabling you to retrieve, filter, refine, and rank them based on specific criteria.

<i>(Chuỗi tài liệu cho phép bạn xử lý và phân tích lượng lớn dữ liệu văn bản một cách hiệu quả. Chúng cung cấp cách tiếp cận có cấu trúc để làm việc với tài liệu, cho phép bạn truy xuất, lọc, tinh chỉnh và xếp hạng chúng dựa trên các tiêu chí cụ thể.)</i>

By using different types of Document Chains like <b>Stuff, Refine, or Map Re-rank</b>, you can perform specific operations on the retrieved documents and obtain more accurate and relevant results.

<i>(Bằng cách sử dụng các loại Document Chains khác nhau như <b>Stuff, Refine, or Map Re-rank</b>, bạn có thể thực hiện các thao tác cụ thể trên tài liệu được truy xuất và thu được kết quả chính xác và phù hợp hơn.)</i>

In [36]:
import os
from dotenv import load_dotenv
import getpass
import openai
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, MapReduceChain, load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

In [37]:
# Setup model
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
openai.api_key = api_key

In [38]:
model = ChatOpenAI(temperature=0)

## Stuff Chain

This involves putting all relevant data into the Prompt for Langchain's StuffDocumentsChain. The advantage of this method is that it only requires one call to the LLM,  and the model has access to all the information at once.

→<i>(Điều này liên quan đến việc đưa tất cả dữ liệu có liên quan vào Prompt cho StuffDocumentsChain của Langchain. Ưu điểm của phương pháp này là chỉ yêu cầu một lệnh gọi tới LLM và mô hình có quyền truy cập vào tất cả thông tin cùng một lúc.)</i>

In [39]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Software-Engineer-CV.pdf")
docs = loader.load()

In [40]:
cnt = 0 
for doc in docs:
    cnt = cnt +1
    print("-------Document #", cnt)
    print(doc.page_content.strip())

-------Document # 1
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 years  of experience in Software Development in C on  Linux Environment . 
• Over 5 years of programming  experience as an Oracle PL/SQL  developer in 
Analysis, Design and Implementation of business application using Oracle DB

In [41]:
prompt_template = """
You are given a Resume as the below text.
----------
{text}
----------
Question: Please respond with the Key Skills and Experience summary of the person.
Key Skill:
Experience Summary:
"""

In [57]:
prompt = PromptTemplate(template=prompt_template,input_variables=["text"])
stuff_chain = load_summarize_chain(model,
                                   chain_type="stuff",
                                   prompt=prompt,
                                   verbose = True
                                  )
print(stuff_chain.llm_chain.prompt.template)


You are given a Resume as the below text.
----------
{text}
----------
Question: Please respond with the Key Skills and Experience summary of the person.
Key Skill:
Experience Summary:



In [59]:
output_summary = stuff_chain.invoke(docs)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are given a Resume as the below text.
----------
        
                                                 
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12

In [60]:
print(output_summary["output_text"])

Key Skills:
- Programming Languages: PL/SQL, C, Pro C, Shell scripting
- Database Management: Oracle
- Operating System: Linux
- Version Control: GIT, TFS, CVS
- Tools: PL/SQL developer, JIRA, Confluence, Visual studio, GDB, Mercurial, Spirent Test Centre (STC), Wireshark
- Leadership and Team Collaboration

Experience Summary:
- 12 years of experience in Software Development in C on Linux Environment
- Over 5 years of programming experience as an Oracle PL/SQL developer
- Experience in all stages of Software Development Life Cycle
- Expertise in creating complex DB objects using PL/SQL
- Good knowledge of Oracle performance-related features
- Experience in leading and collaborating with cross-functional teams
- Experience in healthcare domain, L2/L3 protocol development, and software project management.


# Refine Chain

The <b>Refine Documents Chain</b> uses an iterative process to generate a response by analyzing each input document and updating its answer accordingly. <br>
<i>→(Refine Documents Chain sử dụng quy trình lặp lại để tạo phản hồi bằng cách phân tích từng tài liệu đầu vào và cập nhật câu trả lời tương ứng.)</i>

It passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to obtain a new answer for each document. <br>
<i>→(Nó chuyển tất cả các đầu vào không phải tài liệu (non-document), tài liệu hiện tại (current document) và câu trả lời trung gian mới nhất tới chuỗi LLM để có được câu trả lời mới cho mỗi tài liệu.)</i>

This chain is ideal for tasks that involve analyzing more documents than can fit in the model’s context, as it only passes a single document to the LLM at a time. <br>
<i>→(Chuỗi này lý tưởng cho các nhiệm vụ liên quan đến việc phân tích nhiều tài liệu hơn mức có thể phù hợp với ngữ cảnh của mô hình, vì mỗi lần nó chỉ chuyển một tài liệu duy nhất tới LLM.)</i>

In [44]:
refine_chain = load_summarize_chain(model,
                                   chain_type="refine",
                                   verbose = True
                                  )
print(refine_chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary.
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary.
If the context isn't useful, return the original summary.


In [45]:
output_summary = refine_chain.invoke(docs)
output_summary



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"        
                                                 
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 years  

{'input_documents': [Document(page_content='        \n                                                 \nName: Sunil Sharma                              Mobile: +91 9898989898  \n \nDesignation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  \n \nObjective:   \nExperienced S enior Software Developer with 1 2 years of hands -on expertise in \ndesigning, developing, and delivering high -quality software solutions.  \nProven track record of successfully leading and collaborating with cross -functional \nteams to deliver projects on time and within budget. Seeking to leverage my technical \nskills and leadership experience to contribute to innovative software projects.  \nEducation:  \nBachelor in Engineering in Electronics and Communication  \nK.L.N.  College of Information Technology, Madurai - 2007  \nProfessional Summary:  \n• 12 years  of experience in Software Development in C on  Linux Environment . \n• Over 5 years of programming  experience as an Orac

In [46]:
print(output_summary["output_text"])

Sunil Sharma is a Senior Technical Lead with 12 years of experience in software development, specializing in Oracle PL/SQL. He has a Bachelor's degree in Engineering and expertise in all stages of the Software Development Life Cycle. Sunil has a proven track record of successfully leading cross-functional teams and delivering projects on time and within budget. He has worked in various roles at companies such as HCL Technologies, Nokia Networks, and Plintron Global Technology Solutions Pvt. Ltd. Sunil's skills include programming languages such as PL/SQL, C, Pro C, and Shell scripting, database management with Oracle, and proficiency in tools like JIRA, Confluence, and Wireshark. Currently, he is working as the lead developer in a healthcare project based in the USA, leading offshore development activities and handling a PL SQL development team. Sunil has expertise in solving complex SQL problems related to reporting, creating indexes and partitioning tables for SQL tuning, writing com

## Map-Reduce Chain

The MapReduceDocumentsChain method is used to process large amounts of data efficiently.<br>
<i>→(Phương pháp MapReduceDocumentsChain được sử dụng để xử lý lượng lớn dữ liệu một cách hiệu quả.)</i><br>

This involves applying an LLM chain to each document individually <b>(in the Map step)</b> and producing a new document. Then, all the new documents are passed to a separate combined documents chain to get a single output <b>(in the Reduce step)</b>. The mapped documents can be compressed before being passed to the combined documents chain if necessary. This compression step is performed recursively.<br>
<i>→(Điều này liên quan đến việc áp dụng LLM chain cho từng tài liệu riêng lẻ (trong bước Map) và tạo ra một tài liệu mới. Sau đó, tất cả tài liệu mới được chuyển đến một chuỗi tài liệu kết hợp riêng biệt để có được một đầu ra duy nhất (trong bước Reduce). Các tài liệu được ánh xạ có thể được nén trước khi chuyển sang chuỗi tài liệu kết hợp nếu cần. Bước nén này được thực hiện đệ quy.)</i>

In [55]:
map_reduce_chain = load_summarize_chain(model,
                                        chain_type="map_reduce",
                                        verbose = True
                                       )

print(map_reduce_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [61]:
output_summary = map_reduce_chain.invoke(docs)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"        
                                                 
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 year

In [63]:
print(output_summary["output_text"])

Sunil Sharma is a Senior Technical Lead with 12 years of experience in software development, specializing in Oracle PL/SQL development. He has led cross-functional teams on various projects, including in the healthcare and telecommunications domains. Sunil is currently working on a healthcare project in the USA, overseeing offshore development activities and managing a PL SQL development team. He has expertise in solving complex SQL problems, developing procedures and functions, and client interaction. Additionally, he has experience in L2/L3 protocol development at Nokia Networks.
