# It's Alive! LangChain Summarization with Map-Reduce and Refine

This guide will roughly follow the [Summarization How-To from the LangChain Docs](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html).

## Setup

First we want to install our dependencies and load the OpenAI API key as an environment variable:

In [1]:
%pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
from dotenv import load_dotenv
load_dotenv()

True

### Loading in the book Frankenstein

In [10]:
from langchain.document_loaders import GutenbergLoader
loader = GutenbergLoader("https://www.gutenberg.org/cache/epub/84/pg84.txt")

In [11]:
data = loader.load()

In [14]:
data[0].page_content[:300]

'The Project Gutenberg eBook of Frankenstein, by Mary Wollstonecraft Shelley\r\n\n\n\r\n\n\nThis eBook is for the use of anyone anywhere in the United States and\r\n\n\nmost other parts of the world at no cost and with almost no restrictions\r\n\n\nwhatsoever. You may copy it, give it away or re-use it under the ter'

### Splitting the text into smaller chunks

In [15]:
from langchain.text_splitter import CharacterTextSplitter
import tiktoken

In [32]:
enc = tiktoken.encoding_for_model("text-davinci-003")

def _tiktoken_encoder(text, **kwargs):
  return len(enc.encode(text, **kwargs))

text_splitter = CharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=0,
        separator="\r\n\n\n\r\n\n\n", # project gutenberg paragraph separator
        length_function=_tiktoken_encoder
    )

In [38]:
docs = text_splitter.split_documents(data)

In [39]:
len(docs)

138

In [40]:
docs[3]

Document(page_content='But I have one want which I have never yet been able to satisfy, and the\r\n\n\nabsence of the object of which I now feel as a most severe evil, I have no\r\n\n\nfriend, Margaret: when I am glowing with the enthusiasm of success, there\r\n\n\nwill be none to participate my joy; if I am assailed by disappointment, no\r\n\n\none will endeavour to sustain me in dejection. I shall commit my thoughts\r\n\n\nto paper, it is true; but that is a poor medium for the communication of\r\n\n\nfeeling. I desire the company of a man who could sympathise with me, whose\r\n\n\neyes would reply to mine. You may deem me romantic, my dear sister, but I\r\n\n\nbitterly feel the want of a friend. I have no one near me, gentle yet\r\n\n\ncourageous, possessed of a cultivated as well as of a capacious mind, whose\r\n\n\ntastes are like my own, to approve or amend my plans. How would such a\r\n\n\nfriend repair the faults of your poor brother! I am too ardent in execution\r\n\n\nand too

In [41]:
# save your pennies
docs = docs[:10]

### OpenAI Setup

In [65]:
from langchain import OpenAI
llm = OpenAI(temperature=0, max_tokens=1024)

## Summarize with Map-Reduce

In [45]:
from langchain import PromptTemplate, LLMChain
from langchain.chains.summarize import load_summarize_chain

In [72]:
chain = load_summarize_chain(llm, chain_type="map_reduce")
map_reduce_summary = chain.run(docs)
map_reduce_summary

" Robert Walton is a self-educated man of twenty-eight who is embarking on a voyage of discovery to the Arctic. He has hired a vessel and is collecting sailors for his voyage, which he plans to begin in June. He has found a master for his ship who is known for his gentleness and mild discipline, and he has also found a stranger on the ocean who is gradually improving in health. The stranger shares his story with Walton, expressing his grief over his misfortunes and his appreciation of the beauty of nature. Walton hopes that the stranger's story will be useful to him in his own pursuits."

The above took 23 seconds to run producing the following output:

> Robert Walton is a self-educated man of twenty-eight who is embarking on a voyage of discovery to the Arctic. He has hired a vessel and is collecting sailors for his voyage, which he plans to begin in June. He has found a master for his ship who is known for his gentleness and mild discipline, and he has also found a stranger on the ocean who is gradually improving in health. The stranger shares his story with Walton, expressing his grief over his misfortunes and his appreciation of the beauty of nature. Walton hopes that the stranger's story will be useful to him in his own pursuits.

### Customizing Map-Reduce (just prompts)

Let's say we want a bulleted list where each chunk is summarized into it's own bullet.

In [56]:
map_prompt = """Write a one sentence summary of the following:

{text}

ONE-SENTENCE SUMMARY:"""
combine_prompt = """Combine the following sentences using bullet points in markdown format:

{text}

BULLETED LIST:"""
MAP_PROMPT = PromptTemplate(template=map_prompt, input_variables=["text"])
COMBINE_PROMPT = PromptTemplate(template=combine_prompt, input_variables=["text"])

In [67]:
chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=MAP_PROMPT, combine_prompt=COMBINE_PROMPT)
chain.run(docs)

'\n- Mary Wollstonecraft Shelley\'s novel "Frankenstein; or, the Modern Prometheus" follows the journey of a man who embarks on an enterprise with no disastrous consequences.\n- Robert Walton has taken a second step towards his enterprise by hiring a vessel and collecting sailors who possess dauntless courage.\n- Despite his lack of education and lack of a friend to share his joys and sorrows, the protagonist finds a companion in his lieutenant, an Englishman with noble endowments of humanity.\n- A youth who has been raised in solitude is determined to find a master for his ship who is known for his gentleness, integrity, and courage, and finds one in the form of a noble, uneducated mariner who sacrificed his own happiness for the sake of his beloved.\n- Robert Walton writes to his sister to assure her of his safety and determination to succeed in his voyage, despite the dangers he faces in the unexplored regions he is exploring.\n- While on a voyage of discovery towards the northern p

This one took ~ 30 seconds to produce the following output:

- Mary Wollstonecraft Shelley's novel "Frankenstein; or, the Modern Prometheus" follows the journey of a man who embarks on an enterprise with no disastrous consequences.
- Robert Walton has taken a second step towards his enterprise by hiring a vessel and collecting sailors who possess dauntless courage.
- Despite his lack of education and lack of a friend to share his joys and sorrows, the protagonist finds a companion in his lieutenant, an Englishman with noble endowments of humanity.
- A youth who has been raised in solitude is determined to find a master for his ship who is known for his gentleness, integrity, and courage, and finds one in the form of a noble, uneducated mariner who sacrificed his own happiness for the sake of his beloved.
- Robert Walton writes to his sister to assure her of his safety and determination to succeed in his voyage, despite the dangers he faces in the unexplored regions he is exploring.
- While on a voyage of discovery towards the northern pole, the crew of a ship encounter a mysterious giant in a sledge, and later rescue a European man from the brink of destruction.
- After being rescued from a perilous situation, the stranger is eager to find the man who fled from him, and is comforted by the possibility that he may have reached safety before the ice broke.
- Despite his own grief, the stranger has become a source of comfort and understanding for the narrator, who finds himself increasingly drawn to the stranger and his story.
- The stranger, who has suffered great misfortunes, offers to tell his story to Captain Walton, hoping that it will provide a moral lesson and console him in case of failure in his own undertaking.

### Map-Reduce with Custom Chains

Our output above would be truncated depending on our token limit. There are many ways to fix this (increasing our token limit, aiming at a model with much larger limits, asking for something shorter, etc.).

In this case, we're asking the LLM to concatenate in the reduce steps. There's no real need to have an LLM do this.

Let's implement a custom chain that concatenates the provided docs.

In [61]:
from langchain.docstore.document import Document
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain

from typing import List, Optional, Any, Tuple

class ConcatenateChain(BaseCombineDocumentsChain):

    def combine_docs(self, docs: List[Document], **kwargs: Any) -> Tuple[str, dict]:
        return "- " + "\n- ".join([doc.page_content for doc in docs]).strip(), {}

    async def acombine_docs(self, docs: List[Document], **kwargs: Any) -> Tuple[str, dict]:
        return "- " + "\n- ".join([doc.page_content for doc in docs]).strip(), {}

    def prompt_length(self, _: List[Document], **kwargs: Any) -> Optional[int]:
        """Return the prompt length given the documents passed in.
        Returns None if the method does not depend on the prompt length.
        """
        return None

    @property
    def _chain_type(self) -> str:
        return "concatenate_chain"

In [62]:
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
map_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)
map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    combine_document_chain=ConcatenateChain()
)

In [63]:
map_reduce_chain.run(docs)

'- Mary Wollstonecraft Shelley\'s novel "Frankenstein; or, the Modern Prometheus" follows the journey of a man who embarks on an enterprise with no disastrous consequences.\n-  Despite his initial disappointment, the protagonist is inspired by the cold northern breeze to embark on a voyage of discovery to the North Pacific Ocean, driven by his passion for knowledge and the promise of a land of beauty and delight.\n-  Robert Walton has taken a second step towards his enterprise by hiring a vessel and collecting sailors who possess dauntless courage.\n-  Despite his lack of formal education, the protagonist is in search of a friend who can sympathize with him and help him regulate his mind, and he finds a potential companion in his lieutenant, an Englishman with noble endowments of humanity.\n-  A youth who has been raised in solitude is determined to find a master for his ship who is known for his gentleness, integrity, and courage, and finds one in the form of a noble, uneducated marin

The above took 6 seconds to run and produced the following

- Mary Wollstonecraft Shelley's novel "Frankenstein; or, the Modern Prometheus" follows the journey of a man who embarks on an enterprise with no disastrous consequences.
-  Despite his initial disappointment, the protagonist is inspired by the cold northern breeze to embark on a voyage of discovery to the North Pacific Ocean, driven by his passion for knowledge and the promise of a land of beauty and delight.
-  Robert Walton has taken a second step towards his enterprise by hiring a vessel and collecting sailors who possess dauntless courage.
-  Despite his lack of formal education, the protagonist is in search of a friend who can sympathize with him and help him regulate his mind, and he finds a potential companion in his lieutenant, an Englishman with noble endowments of humanity.
-  A youth who has been raised in solitude is determined to find a master for his ship who is known for his gentleness, integrity, and courage, and finds one in the form of a noble, uneducated mariner who sacrificed his own happiness for the sake of his beloved.
-  Robert Walton writes to his sister to assure her of his safety and determination to succeed in his voyage, despite the dangers he faces in the unexplored regions of the sea.
-  While on a voyage of discovery towards the northern pole, the crew of a ship encounter a mysterious giant in a sledge, and later rescue a European man from the brink of destruction.
-  After being rescued from a perilous situation, the stranger is eager to find the man who fled from him, and is comforted by the possibility that he may have reached safety before the ice broke.
-  Despite his own grief, the stranger has become a source of comfort and understanding for the narrator, who finds himself increasingly drawn to the stranger and his story.
-  The stranger, who has suffered great misfortunes, offers to tell his story to Captain Walton, hoping that it will provide a moral lesson and console him in case of failure in his own undertaking.

## Refine

Let's now compare the results of summary to refine. Refine allows us to create an initial summary and then update it as we go.

In [68]:
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(docs)

"\nThe narrator, writing from St. Petersburg, informs his sister that he has arrived safely and is confident in the success of his undertaking. He is the author of the novel Frankenstein; or, The Modern Prometheus, by Mary Wollstonecraft Shelley. He is inspired by the cold northern breeze and his daydreams of discovering a land of beauty and delight beyond the pole. He has dedicated himself to this great enterprise by inuring his body to hardship, accompanying whale-fishers on several expeditions to the North Sea, and voluntarily enduring cold, famine, thirst, and want of sleep. He has twice hired himself as an under-mate in a Greenland whaler and is now preparing to depart for Archangel in a fortnight or three weeks to hire a ship and engage sailors for his voyage. He has secured the services of a master of an excellent disposition and remarkable gentleness and mildness of discipline, whom he heard of in a romantic manner from a lady who owes him the happiness of her life. He hopes to

The above took 2 minutes and 50 seconds to run. A key downside of Refine is the requirement of running it sequentially. The Map portion of map-reduce is run in parallel, so it is significantly
faster than refine on many documents (even when including collapse steps).

Here's the output:

> The narrator, writing from St. Petersburg, informs his sister that he has arrived safely and is confident in the success of his undertaking. He is the author of the novel Frankenstein; or, The Modern Prometheus, by Mary Wollstonecraft Shelley. He is inspired by the cold northern breeze and his daydreams of discovering a land of beauty and delight beyond the pole. He has dedicated himself to this great enterprise by inuring his body to hardship, accompanying whale-fishers on several expeditions to the North Sea, and voluntarily enduring cold, famine, thirst, and want of sleep. He has twice hired himself as an under-mate in a Greenland whaler and is now preparing to depart for Archangel in a fortnight or three weeks to hire a ship and engage sailors for his voyage. He has secured the services of a master of an excellent disposition and remarkable gentleness and mildness of discipline, whom he heard of in a romantic manner from a lady who owes him the happiness of her life. He hopes to discover the wondrous power that attracts the needle and to regulate celestial observations, and to confer an inestimable benefit on all mankind by discovering a passage near the pole. He expresses his loneliness and lack of companionship, and mentions his lieutenant, an Englishman of great courage and enterprise, whom he met on board a whale vessel and has since employed to assist in his enterprise. He is filled with a passionate enthusiasm for the dangerous mysteries of the ocean and is determined to explore the untamed yet obedient element. He is prepared to face the dangers of the voyage and is hopeful of success, despite the potential risks. After being surrounded by ice and a thick fog, the narrator and his crew witness a strange sight of a low carriage, fixed on a sledge and drawn by dogs, pass on towards the north. They then encounter a European man in a sledge who questions them about their voyage. The narrator reveals that they are on a voyage of discovery towards the northern pole. He expresses his love for his sister and assures her that he will be prudent and cautious in his endeavors. The stranger, who was nearly frozen and emaciated by fatigue and suffering, was brought on board and restored to life. He was filled with a deep gloom and revealed that he was pursuing someone who had fled from him. The narrator and his crew then recall seeing a man in a sledge being pulled by dogs the day before. The stranger was filled with eagerness to be on deck to watch for the sledge, but was too weak to sustain the rawness of the atmosphere. The narrator promises to watch for him and give him instant notice if any new object should appear in sight. The narrator's affection for the stranger grows as he recovers from his illness and the narrator shares his own plans and ambitions with him. The stranger is overwhelmed with emotion and reveals his own story of pursuit and despair. The narrator is moved by the stranger's plight and offers him friendship and understanding. The narrator is determined to explore the untamed yet obedient element of the ocean and is hopeful of success, despite the potential risks. He is also determined to help the stranger in his pursuit and to provide him with the comfort and companionship he needs. The narrator and the stranger share a deep bond of understanding and the narrator is filled with admiration for the stranger's courage and resilience. The narrator is determined to help the stranger in his pursuit and to provide him with the comfort and companionship he needs. He is hopeful that his voyage will be successful and that he will be able to help the stranger in his quest.


### Custom Refine Prompt

The idea of refine has some promise, but this summary is way too long. One benefit the map-reduce has is the collapse and combine steps that each are emphasizing the need for a concise summary.

The default refine prompt does not include an emphasis on building a concise summary. Let's change that.

In [69]:
REFINE_PROMPT_TMPL = (
    "Your job is to produce a final concise summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary\n"
    "If the context isn't useful, return the original summary.\n"
    "Make sure the result is concise."
)
REFINE_PROMPT = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=REFINE_PROMPT_TMPL,
)

In [71]:
refine_chain = load_summarize_chain(llm, chain_type="refine", refine_prompt=REFINE_PROMPT)
refine_summary = refine_chain.run(docs)
refine_summary

"\n\nThe narrator, writing from St. Petersburg, assures his sister of his safety and optimism about the success of his undertaking. He is the author of Frankenstein; or, The Modern Prometheus, and is driven by his passion for reading and ambition to obtain a niche in the temple of great poets. He has dedicated himself to a great enterprise of discovering a passage near the pole to those countries, to reach which at present so many months are requisite, or to ascertain the secret of the magnet. He has inured his body to hardship, accompanying whale-fishers on several expeditions to the North Sea and voluntarily enduring cold, famine, thirst, and want of sleep. He has hired a vessel and is collecting sailors for his voyage to the North Pacific Ocean, set to depart in a fortnight or three weeks. Despite his ambition, he laments his lack of a friend to share his joys and sorrows, and has hired an English lieutenant of excellent disposition and remarkable gentleness and mildness of discipli

The above took 2 minutes and 14 seconds to complete and produced the following:

> The narrator, writing from St. Petersburg, assures his sister of his safety and optimism about the success of his undertaking. He is the author of Frankenstein; or, The Modern Prometheus, and is driven by his passion for reading and ambition to obtain a niche in the temple of great poets. He has dedicated himself to a great enterprise of discovering a passage near the pole to those countries, to reach which at present so many months are requisite, or to ascertain the secret of the magnet. He has inured his body to hardship, accompanying whale-fishers on several expeditions to the North Sea and voluntarily enduring cold, famine, thirst, and want of sleep. He has hired a vessel and is collecting sailors for his voyage to the North Pacific Ocean, set to depart in a fortnight or three weeks. Despite his ambition, he laments his lack of a friend to share his joys and sorrows, and has hired an English lieutenant of excellent disposition and remarkable gentleness and mildness of discipline to assist him in his enterprise. He has also encountered a stranger in a wretched condition, whom he has taken on board and cared for. The stranger is searching for someone who fled from him, and the narrator believes they may have seen him in a sledge across the ice. Despite the stranger's silence and unease around others, the narrator has grown to love him like a brother and is deeply moved by his grief. He is also interested in the narrator's project and encourages him to pursue it, expressing his own grief and admiration for the beauty of nature.

## GPT: Grade Thyself

Let's end with having the LLM compare the two final summaries (map-reduce vs refine):

In [73]:
llm("""Compare the following two summaries:
1. MAP_REDUCE_SUMMARY: {map_reduce_summary}

2. REFINE_SUMMARY: {refine_summary}

Comparison:""".format(map_reduce_summary=map_reduce_summary, refine_summary=refine_summary))

"\n\nThe two summaries both describe the narrator's journey to the Arctic and his encounters with a stranger. The MAP_REDUCE_SUMMARY focuses on the details of the narrator's preparations for the voyage, such as hiring a vessel and collecting sailors, while the REFINE_SUMMARY focuses on the narrator's motivations for the voyage, his physical endurance, and his relationship with the stranger. The REFINE_SUMMARY also provides more detail about the stranger's story and his appreciation of nature."

The above took 10 seconds and produced the following:

> The two summaries both describe the narrator's journey to the Arctic and his encounters with a stranger. The MAP_REDUCE_SUMMARY focuses on the details of the narrator's preparations for the voyage, such as hiring a vessel and collecting sailors, while the REFINE_SUMMARY focuses on the narrator's motivations for the voyage, his physical endurance, and his relationship with the stranger. The REFINE_SUMMARY also provides more detail about the stranger's story and his appreciation of nature.

## Prompt Reference

Here are some links to the default prompts for reference:

- [map_reduce_prompt.py](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/map_reduce_prompt.py)
- [refine_prompts.py](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/refine_prompts.py)