# Summarization

This notebook walks through how to use LangChain for summarization over a list of documents. It covers three different chain types: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](https://docs.langchain.com/docs/components/chains/index_related_chains).

## Prepare Data
First we prepare the data. For this example we create multiple documents from one long one, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).

In [1]:
import config
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate

llm = OpenAI(temperature=0)

text_splitter = CharacterTextSplitter()

In [2]:
with open("assets/movierecaps000.txt") as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

In [3]:
import pprint
type(texts), len(texts)
pprint.pprint(texts[0], width=80, indent=0)

("<0.0>  Hey, what's up guys! <8.0>  Today I'll show you an action thriller "
 'film, Venom Part 2, Let There Be Carnage. <13.08>  Spoiler ahead, watch out '
 'and take care! <15.76>  The movie begins in 1996 with two young lovers, '
 'Cletus and Francis, living at a reform school <21.48>  in California. '
 '<22.66>  Their rooms are directly above each other, and they use a pipe to '
 "communicate with one <26.240000000000002>  another. <27.24>  As Francis' "
 'ultrasonic power grows stronger, the school sends her to a top-secret '
 'institute <31.799999999999997>  to lock her powers forever. <33.6>  From the '
 'window, Cletus angrily watches as the police drag Francis into a van guarded '
 '<38.26>  by a detective. <39.72>  During the trip, Francis shouts loud '
 "enough to damage detective's left ear. <43.68>  The van falls sideward when "
 'Francis tries to grab the gun from detective, but he shoots '
 '<48.239999999999995>  her first to stop her escape. <50.28>  The bullets '
 "im

In [4]:
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:3]]

In [5]:
print(type(docs[0]))
aja = docs[0]
aja

<class 'langchain.schema.Document'>


Document(page_content="<0.0>  Hey, what's up guys! <8.0>  Today I'll show you an action thriller film, Venom Part 2, Let There Be Carnage. <13.08>  Spoiler ahead, watch out and take care! <15.76>  The movie begins in 1996 with two young lovers, Cletus and Francis, living at a reform school <21.48>  in California. <22.66>  Their rooms are directly above each other, and they use a pipe to communicate with one <26.240000000000002>  another. <27.24>  As Francis' ultrasonic power grows stronger, the school sends her to a top-secret institute <31.799999999999997>  to lock her powers forever. <33.6>  From the window, Cletus angrily watches as the police drag Francis into a van guarded <38.26>  by a detective. <39.72>  During the trip, Francis shouts loud enough to damage detective's left ear. <43.68>  The van falls sideward when Francis tries to grab the gun from detective, but he shoots <48.239999999999995>  her first to stop her escape. <50.28>  The bullets impact, knocks Francis onto the s

In [6]:
from langchain.chains.summarize import load_summarize_chain

## Quickstart
If you just want to get started as quickly as possible, this is the recommended way to do it:

In [8]:
chain = load_summarize_chain(llm, chain_type="map_reduce")
summary = chain.run(docs)
pprint.pprint(summary, width=80, indent=0)

' Two lovers, Cletus and Francis, are separated when Francis is taken away to a secret institute. 25 years later, Eddie is asked to visit Cletus in prison and discovers carvings in his cell room. With the help of Venom, Eddie is able to locate the missing bodies at Rodeo Beach, solving the case.'

If you want more control and understanding over what is happening, please see the information below.

## The `stuff` Chain

This sections shows results of using the `stuff` Chain to do summarization.

In [9]:
chain = load_summarize_chain(llm, chain_type="stuff")

In [10]:
summary = chain.run(docs)
pprint.pprint(summary, width=80, indent=0)

" In Venom Part 2, Let There Be Carnage, two young lovers, Cletus and Francis, are separated when Francis is taken to a top-secret institute to lock her powers away. Twenty-five years later, Cletus is about to be convicted of murder and Eddie is asked to visit him in prison. Eddie discovers carvings in Cletus' cell room and, with the help of Venom, is able to pinpoint the location of the missing buried bodies at Rodeo Beach, solving the case overnight."

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [11]:
prompt_template = """Write a concise summary of the following:


{text}


CONCISE WHILE RETAINING TIMESTAMPT DELIMITER <> OF THE TEXT:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
summary = chain.run(docs)
pprint.pprint(summary, width=80, indent=0)

('\n'
 '\n'
 'Two young lovers, Cletus and Francis, living at a reform school in '
 'California have their lives changed when Francis is sent to a top-secret '
 'institute to lock her powers forever. Twenty-five years later, Cletus is '
 'convicted of death and Eddie is asked to visit him in prison. Eddie '
 "discovers Cletus' carvings in his cell room and Venom deciphers them to be a "
 'poem and a location. Eddie pinpoints the missing buried bodies at Rodeo '
 'Beach to the police, solving the case overnight.')


## The `map_reduce` Chain

This sections shows results of using the `map_reduce` Chain to do summarization.

In [12]:
chain = load_summarize_chain(llm, chain_type="map_reduce")

In [13]:
summary = chain.run(docs)
pprint.pprint(summary, width=80, indent=0)

(' Two lovers, Cletus and Francis, are separated when Francis is taken away to '
 'a secret institute. 25 years later, Eddie is asked to visit Cletus in prison '
 'and discovers carvings in his cell room. With the help of Venom, Eddie is '
 'able to locate the missing bodies at Rodeo Beach, solving the case.')


**Intermediate Steps**

We can also return the intermediate steps for `map_reduce` chains, should we want to inspect them. This is done with the `return_map_steps` variable.

In [14]:
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)

In [15]:
out = chain({"input_documents": docs}, return_only_outputs=True)
pprint.pprint(out['intermediate_steps'], width=80, indent=1)
pprint.pprint(out['output_text'], width=80, indent=0)

{'intermediate_steps': [" In Venom Part 2, Let There Be Carnage, two young lovers, Cletus and Francis, are separated when Francis is taken to a top-secret institute to lock her powers away. Twenty-five years later, Cletus is about to be convicted of murder and Eddie is asked to visit him in prison. Eddie discovers carvings in Cletus' cell room and, with the help of Venom, is able to pinpoint the location of the missing buried bodies at Rodeo Beach, solving the case overnight."],
 'output_text': ' Two lovers, Cletus and Francis, are separated when Francis is taken away to a secret institute. 25 years later, Eddie is asked to visit Cletus in prison and discovers carvings in his cell room. With the help of Venom, Eddie is able to locate the missing bodies at Rodeo Beach, solving the case.'}

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [19]:
prompt_template = """Write a concise summary of the following:


{text}


CONCISE WHILE RETAINING TIMESTAMPT DELIMITER <> OF THE TEXT:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
out = chain({"input_documents": docs}, return_only_outputs=True)
pprint.pprint(out['intermediate_steps'], width=80, indent=0)
pprint.pprint(out['output_text'], width=80, indent=0)

['\n'
'\n'
'Two young lovers, Cletus and Francis, living at a reform school in California '
'have their lives changed when Francis is sent to a top-secret institute to '
'lock her powers forever. Twenty-five years later, Cletus is convicted of '
"death and Eddie is asked to visit him in prison. Eddie discovers Cletus' "
'carvings in his cell room and Venom deciphers them to be a poem and a beach. '
'Eddie pinpoints the missing buried bodies at Rodeo Beach to the police, '
'solving the case overnight.']
('\n'
 '\n'
 'Two young lovers, Cletus and Francis, living at a reform school in '
 'California have their lives changed when Francis is sent away. Twenty-five '
 'years later, Cletus is convicted of death and Eddie is asked to visit him in '
 "prison. Eddie discovers Cletus' carvings in his cell room and Venom "
 'deciphers them to be a poem and a beach. Eddie pinpoints the missing buried '
 'bodies at Rodeo Beach to the police, solving the case overnight <>')


## The `refine` Chain

This sections shows results of using the `refine` Chain to do summarization.

In [20]:
chain = load_summarize_chain(llm, chain_type="refine")

summary = chain.run(docs)
pprint.pprint(summary, width=80, indent=0)

(' In Venom Part 2, Let There Be Carnage, two young lovers, Cletus and '
 'Francis, are separated when Francis is taken to a top-secret institute to '
 'lock her powers away. Twenty-five years later, Cletus is about to be '
 'convicted of murder and Eddie is asked to visit him in prison. Eddie '
 "discovers carvings in Cletus' cell room and, with the help of Venom, is able "
 'to pinpoint the location of the missing buried bodies at Rodeo Beach, '
 'solving the case overnight.')


**Intermediate Steps**

We can also return the intermediate steps for `refine` chains, should we want to inspect them. This is done with the `return_refine_steps` variable.

In [21]:
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True)

out = chain({"input_documents": docs}, return_only_outputs=True)
pprint.pprint(out['intermediate_steps'], width=80, indent=0)
pprint.pprint(out['output_text'], width=80, indent=0)

[' In Venom Part 2, Let There Be Carnage, two young lovers, Cletus and Francis, '
'are separated when Francis is taken to a top-secret institute to lock her '
'powers away. Twenty-five years later, Cletus is about to be convicted of '
'murder and Eddie is asked to visit him in prison. Eddie discovers carvings in '
"Cletus' cell room and, with the help of Venom, is able to pinpoint the "
'location of the missing buried bodies at Rodeo Beach, solving the case '
'overnight.']
(' In Venom Part 2, Let There Be Carnage, two young lovers, Cletus and '
 'Francis, are separated when Francis is taken to a top-secret institute to '
 'lock her powers away. Twenty-five years later, Cletus is about to be '
 'convicted of murder and Eddie is asked to visit him in prison. Eddie '
 "discovers carvings in Cletus' cell room and, with the help of Venom, is able "
 'to pinpoint the location of the missing buried bodies at Rodeo Beach, '
 'solving the case overnight.')


**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [22]:
prompt_template = """Write a concise summary of the following:


{text}


RETAINING TIMESTAMPT DELIMITER MARKED WITH <number.number>, for example you will find <0.0>:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary."
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
out = chain({"input_documents": docs}, return_only_outputs=True)
pprint.pprint(out['intermediate_steps'], width=80, indent=0)
pprint.pprint(out['output_text'], width=80, indent=0)

['\n'
'\n'
'In Venom Part 2: Let There Be Carnage, two young lovers, Cletus and Francis, '
'living at a reform school in California, have their rooms directly above each '
"other and use a pipe to communicate. Francis' ultrasonic power grows stronger "
'and the school sends her to a top-secret institute to lock her powers '
'forever. During the trip, Francis shouts loud enough to damage the '
"detective's left ear and tries to grab the gun from him, but he shoots her "
'first. Francis wakes up the next day, but is locked up in a thick glass cage. '
'Twenty-five years later, Cletus is convicted of death once the FBI locates '
'the missing bodies of the people he murdered. Detective summons Eddie to the '
"police station to tackle Cletus' special request to see him. Eddie is "
'reluctant to accept the invitation, but Venom sprouts behind him and Eddie '
'quickly drags Venom into the bathroom. Eddie meets Cletus in his cage and '
'Cletus knows his story sells, so he wants Eddie to publish

In [24]:
pprint.pprint(out['output_text'], width=80, indent=0)

('\n'
 '\n'
 'In Venom Part 2: Let There Be Carnage, two young lovers, Cletus and Francis, '
 'living at a reform school in California, have their rooms directly above '
 "each other and use a pipe to communicate. Francis' ultrasonic power grows "
 'stronger and the school sends her to a top-secret institute to lock her '
 'powers forever. During the trip, Francis shouts loud enough to damage the '
 "detective's left ear and tries to grab the gun from him, but he shoots her "
 'first. Francis wakes up the next day, but is locked up in a thick glass '
 'cage. Twenty-five years later, Cletus is convicted of death once the FBI '
 'locates the missing bodies of the people he murdered. Detective summons '
 "Eddie to the police station to tackle Cletus' special request to see him. "
 'Eddie is reluctant to accept the invitation, but Venom sprouts behind him '
 'and Eddie quickly drags Venom into the bathroom. Eddie meets Cletus in his '
 'cage and Cletus knows his story sells, so he wants Ed