### Document Chains

#### Document chains allows to process and analyse large amounts of text data efficiently. They provide a structured approach to  working with documents, enabling one to retrieve, filter, and rank them based on specific criteria

#### By using different types of Document Chains like Stuff, Refine, Map Reduce, Map Re-rank, you can perform specific operations on the retrieved documents and obtain more accurate and relevant results

In [2]:
import os
import getpass
import textwrap

from langchain import PromptTemplate, LLMChain
from langchain_openai import OpenAI
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

from dotenv import load_dotenv
load_dotenv()

True

In [6]:
model = OpenAI(model_name = "gpt-3.5-turbo-instruct", temperature = 0.5)

### Stuff Chain
####
#### This involve putting the entire information to the LLM at one go, i.e. putting all relevant data into the Prompt for LangChain's Stuff Documents Chain to Process. The advantage of this method is that it only requires one call to the LLM, and the model has access to all the information at once

In [9]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Priyanuj_Misra_Resume_AI.pdf")
docs = loader.load()

In [11]:
cnt = 0

for doc in docs:
    cnt+=1
    print("------------Document No: ", cnt)
    print(doc.page_content.strip())

------------Document No:  1
PriyanujMisra
Senior Data Scientist
Dubai,UAE
 GitHub
 LinkedIn
 priyanujmisra.nits@gmail.com
 +971506138031
PROFESSIONALSUMMARY
Senior Data Scientist with 5+ years of experience in leveraging Advanced Analytics and Artificial Intelligence to provide ac-
tionable insights for Fortune 50 global leaders in BFSI, Retail and Telecom industries. Proficient in implementing end-to-end
data modeling pipelines and solutions using Python, PySpark, and Driverless AI, with expertise in Machine Learning and Deep
Learning algorithms.
EXPERIENCE
Etisalate&UAE | SENIOR DATA SCIENTIST
Aug2024–Present|Dubai,UAE
Ô Roles and Responsibilities - Part of the CVM Modeling team delivering Machine
Learning and Generative AI solutions to optimize business needs and drive
revenue. Currently working on preparing a SOP for Information Retrieval using Gen
AI (LLM)
Ô Customer Rejection Reason Analysis - Developed a Topic Modeling Pipeline
using Latent Dirichlet Allocation (LDA) to systemat

In [23]:
prompt_template = """
 You are given a Resume as the below text.
 ----
 {text}
 ----
 Question: Pleaes respond with the Key Skills and Experience summary of the person.
 Key Skills:
 Experience Summary:
 Note: Please put the key skills one by one in a separate line
"""

In [25]:
prompt = PromptTemplate(template = prompt_template, input_variables = ["text"])

stuff_chain = load_summarize_chain(model,
                                   chain_type = "stuff",
                                   prompt = prompt)

output_summary = stuff_chain.run(docs)
print(output_summary)


Key Skills:
- Advanced Analytics
- Artificial Intelligence
- Python
- PySpark
- Driverless AI
- Machine Learning
- Deep Learning
- Data Modeling
- Data Pipelines
- Customer Analytics
- Campaign Analytics
- Generative AI
- Natural Language Processing (NLP)
- Transformers
- Causal Inference
- Word Embeddings
- Large Language Models
- A/B Testing
- Hypothesis Testing
- Customer Rejection Analysis
- Feature Tracking
- SOP Development
- ModelOps
- Meta Learners
- Propensity Modeling
- Offer Management
- Google BigQuery
- Cloud Platforms (Google Cloud Platform, Azure Databricks)
- Version Control (GitHub)
- Statistical Analysis
- Project Management

Experience Summary:
- Senior Data Scientist with 5+ years of experience
- Worked with Fortune 50 global leaders in BFSI, Retail, and Telecom industries
- Proficient in leveraging Advanced Analytics and Artificial Intelligence to provide actionable insights
- Experienced in implementing end-to-end data modeling pipelines and solutions using Pytho

### Refine Chain

#### The Refine documents chain uses an iterative process to generate a response by analyzing each input document and updating its answers accordingly. It passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to obtain a new answer for each document. This chain is idea for tasks that involve analyzing more documents than can fit in the models context as it only passes a single document to the LLM at a time

In [28]:
refine_chain = load_summarize_chain(model, chain_type="refine")

print(refine_chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary.
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary.
If the context isn't useful, return the original summary.


In [30]:
output_summary = refine_chain.run(docs)
print(output_summary)



Priyanuj Misra is a Senior Data Scientist with 5+ years of experience in Advanced Analytics and Artificial Intelligence. He has worked with Fortune 50 companies in BFSI, Retail, and Telecom industries. His expertise includes end-to-end data modeling using Python, PySpark, and Driverless AI, with a focus on Machine Learning and Deep Learning algorithms. He has experience in developing solutions for tasks such as customer rejection analysis, loan approval prediction, and marketing campaign measurement. Priyanuj holds a MS in Machine Learning and AI from Liverpool John Moores University and a B.Tech in Civil Engineering from NITS. His skills include Python, SQL, Pyspark, Google Cloud Platform, Azure Databricks, and GitHub. He is recognized for his significant impact on improving customer engagement and redemption rates through his propensity model.


### Map-Reduce Chain
#### Again used when there is a lot of documents
#### Refine Chain is sequential whereas map reduce is parallel

In [36]:
map_reduce_chain = load_summarize_chain(model, chain_type="map_reduce", verbose = True)

print(map_reduce_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [38]:
output_summary = map_reduce_chain.run(docs)
print(output_summary)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"PriyanujMisra
Senior Data Scientist
Dubai,UAE
 GitHub
 LinkedIn
 priyanujmisra.nits@gmail.com
 +971506138031
PROFESSIONALSUMMARY
Senior Data Scientist with 5+ years of experience in leveraging Advanced Analytics and Artificial Intelligence to provide ac-
tionable insights for Fortune 50 global leaders in BFSI, Retail and Telecom industries. Proficient in implementing end-to-end
data modeling pipelines and solutions using Python, PySpark, and Driverless AI, with expertise in Machine Learning and Deep
Learning algorithms.
EXPERIENCE
Etisalate&UAE | SENIOR DATA SCIENTIST
Aug2024–Present|Dubai,UAE
Ô Roles and Responsibilities - Part of the CVM Modeling team delivering Machine
Learning and Generative AI solutions to optimize business needs and drive
revenue. Currently working on preparing a SOP for Information

### To visually understand more on the difference between them, upload a document which has more than one page