# [Document Chains in LangChain 🦜🔗](https://www.comet.com/site/blog/mastering-document-chains-in-langchain/)

**Document Chains in LangChain** are a powerful tool that can be used for various purposes.

* *Efficient Document Processing*: 
    * Process, analyze large amounts of text data efficiently. 
    * Structured approach, enables:
        * Retrieval
        * Filtering
        * Refining
        * Ranking on specific criteria.

* *Task Decomposition*: 
    * Break down complex tasks into smaller, manageable subtasks. 
    * Different types of **Document Chains** like `Stuff`, `Refine`, `Map Reduce`, or `Map Re-rank` --> specific retrieval operations on retrieved documents --> accurate, relevant results.

* *Improved Accuracy*: **Document Chains** (esp `Map Re-rank` Chains, --> improve the accuracy. 
* By running an initial prompt on each document & returning the highest-scoring response --> prioritize most reliable, accurate answers.


> # Setup

In [2]:
%pip install langchain langchain_community openai tiktoken -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [8]:
import os
import getpass
import textwrap

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
# we will cover docstores and splitters in more details when we get to retrieval
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

In [9]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key:")

In [10]:
!wget -O meditations.txt https://www.gutenberg.org/files/2680/2680-0.txt

--2024-10-09 13:53:01--  https://www.gutenberg.org/files/2680/2680-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 425351 (415K) [text/plain]
Saving to: ‘meditations.txt’


2024-10-09 13:53:04 (275 KB/s) - ‘meditations.txt’ saved [425351/425351]



> # Chunking docs

In [11]:
with open('/workspaces/aa-frontend-django/content/meditations.txt') as f:
    meditations = f.read()

meditations = "\n".join(meditations.split("\n")[575:])

# splits the text based on a character
text_splitter = CharacterTextSplitter(
    separator='\n',
    chunk_size = 1500,
    chunk_overlap=200,
    length_function=len)

meditations_chunks = text_splitter.split_text(meditations)

docs = [Document(page_content=t) for t in meditations_chunks]

> # `Stuff` Chain

* Provide context to LLM through *stuffing method*
    * Put all relevant data into prompt for LangChain’s `StuffDocumentsChain` to process
---
* ✅ Only requires one call to LLM  (model has access to all information at once)
* ❌ May result in a prompt that exceeds context limit --> suitable for smaller amounts of data only!
---

In [12]:
prompt_template ="""
Write a short 90s west coast gangster rap about the virtues learned from
various family members and how they helped you get by in times of crisis. Use
modern terminology where appropriate:

{text}

RAP:
"""
llm = OpenAI(temperature=0.7)
rap_prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

stuff_chain = load_summarize_chain(llm,
                             chain_type="stuff",
                             prompt=rap_prompt)

# we can't fit the entire book of meditations in the context window, so
# take a slice of it
output_summary = stuff_chain.run(docs[:5]) # here we specify what we are stuffing into the prompt at {text}

print(output_summary)

  output_summary = stuff_chain.run(docs[:5]) # here we specify what we are stuffing into the prompt at {text}


Verse 1:
Listen up y'all, I'm about to drop some real knowledge
Got it from my fam, they taught me how to survive in this savage
First up was my grandpa, Verus, he showed me how to be meek
And keep my cool, even when others try to make me freak
My momma, she taught me to be religious and kind
To always do the right thing, and leave bad intentions behind
My great-grandpa, he taught me to study hard and well
And not to waste my time, or be swayed by a fancy spell

Chorus:
My family, they taught me virtues, yeah they showed me the way
In times of crisis, I know just what to do and what to say
They taught me to be strong, and to never back down
In this crazy world, they helped me keep my crown

Verse 2:
My homie Diognetus, he taught me to keep it real
Not to believe in things that are just for show and thrill
No quails for me, I don't need to prove my worth
Just focus on what's important, and stay grounded on this Earth
Rusticus, he showed me that my life needed some change


> # `Refine` Chain

* **Iterative process** to generate response 
    * Analyzes each input document
    * Updates answer accordingly
* Passes following to LLM chain --> new answer for **each document**
    * All non-document inputs
    * Current document
    * Latest intermediate answer
*  Starts with initial prompt on first data set & generates output accordingly
* Remaining docs **pass in previous output along with next document**, and ask LLM to **refine output based on new document**
---
* ✅ Ideal when **more documents than can fit in the model’s context** (passes a single document to the LLM at a time)
* ❌ May perform poorly for tasks requiring cross-referencing between documents or detailed information from multiple documents
* ❌ Makes significantly more LLM calls than other chains
    * Calls are not independant and cannot be parallel like `MapReduce` Documents Chain
* ❌  May also be dependencies on order in which the documents can be analyzed
---

In [13]:
refine_chain = load_summarize_chain(llm, chain_type="refine")

In [14]:
print(refine_chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary.
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary.
If the context isn't useful, return the original summary.


In [15]:
output_summary = refine_chain.run(docs[17:25])
print(output_summary)


The author emphasizes the importance of self-respect and finding happiness within, rather than seeking validation from others or getting caught up in worldly distractions. They also discuss the fleeting nature of all things and the insignificance of material possessions in the grand scheme of the universe. The concept of death and one's own mortality is explored, with the reminder that it is a natural part of life and should not be feared. The author urges readers to consider their place in the universe and connection to a higher power, and to live in accordance with their true nature while avoiding negative emotions and sentiments towards others. They also highlight the dangers of allowing oneself to be consumed by anger, desire, or pleasure without consideration for the common good. The brevity and uncertainty of life is emphasized, and the author encourages readers to make the most of each moment through philosophy, which involves preserving one's spirit, embracing all experiences,

> # `Map Reduce` Chain

* Process large amounts of data efficiently
* Map Step:  
    * Apply LLM chain to each document individually, producing new document
* Reduce Step: 
    * New documents passed to separate combine documents chain to get single output
* If necessary, mapped docs compressed before passing to combine documents chain (performed recursively)
* Requires initial prompt on each chunk of data `???`



In [16]:
map_reduce_chain = load_summarize_chain(llm,
                                        chain_type="map_reduce",
                                        verbose=True)

In [17]:
print(map_reduce_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [18]:
print(map_reduce_chain.combine_document_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [19]:
# just using the first 20 chunks as I don't want to run too long
output_summary = map_reduce_chain.run(docs[:20])

print(output_summary)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"THE FIRST BOOK
I. Of my grandfather Verus I have learned to be gentle and meek, and to
refrain from all anger and passion. From the fame and memory of him that
begot me I have learned both shamefastness and manlike behaviour. Of my
mother I have learned to be religious, and bountiful; and to forbear,
not only to do, but to intend any evil; to content myself with a spare
diet, and to fly all such excess as is incidental to great wealth. Of my
great-grandfather, both to frequent public schools and auditories, and
to get me good and able teachers at home; and that I ought not to think
much, if upon such occasions, I were at excessive charges.
II. Of him that brought me up, not to be fondly addicted to either of
the two great factions of the coursers in the circus, called Prasini,
and Veneti: nor in the amphi

In [20]:
! pip freeze > requirement.txt