# Chapter 3

## Summarizing a document bigger than the LLM’s context window

In [34]:
with open("./Moby-Dick.txt", 'r', encoding='utf-8') as f:
    moby_dick_book = f.read()

In [35]:
from langchain_openai import ChatOpenAI
from langchain_text_splitters import TokenTextSplitter
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnableParallel
import getpass

In [36]:
OPENAI_API_KEY = getpass.getpass('Enter your OPENAI_API_KEY')

Enter your OPENAI_API_KEY ········


In [37]:
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,model_name="gpt-4o-mini")

In [38]:
# Split
text_chunks_chain = (
    RunnableLambda(lambda x: 
        [
            {
                'chunk': text_chunk, 
            }
            for text_chunk in 
               TokenTextSplitter(chunk_size=3000, chunk_overlap=100).split_text(x)
        ]
    )
)

In [39]:
# Map
summarize_chunk_prompt_template = """
Write a concise summary of the following text, and include the main details.
Text: {chunk}
"""

summarize_chunk_prompt = PromptTemplate.from_template(summarize_chunk_prompt_template)
summarize_chunk_chain = summarize_chunk_prompt | llm

summarize_map_chain = (
    RunnableParallel (
        {
            'summary': summarize_chunk_chain | StrOutputParser()        
        }
    )
)

In [40]:
# Reduce
summarize_summaries_prompt_template = """
Write a coincise summary of the following text, which joins several summaries, and include the main details.
Text: {summaries}
"""

summarize_summaries_prompt = PromptTemplate.from_template(summarize_summaries_prompt_template)
summarize_reduce_chain = (
    RunnableLambda(lambda x: 
        {
            'summaries': '\n'.join([i['summary'] for i in x]), 
        })
    | summarize_summaries_prompt 
    | llm 
    | StrOutputParser()
)

In [41]:
map_reduce_chain = (
   text_chunks_chain
   | summarize_map_chain.map()
   | summarize_reduce_chain
)     

In [42]:
summary = map_reduce_chain.invoke(moby_dick_book)

In [43]:
print(summary)

The Project Gutenberg eBook of *Moby-Dick; or The Whale* by Herman Melville is available for free under the Project Gutenberg License, with updates released in June 2001 and August 2021. The narrator, Ishmael, reflects on his desire to go to sea whenever he feels restless, expressing a philosophical connection to the ocean and a preference for the life of a sailor. He embarks on a whaling voyage to seek adventure and authenticity, arriving in New Bedford and struggling to find affordable lodging. Ultimately, he settles at the Spouter Inn, where he contemplates a mysterious painting and the grim atmosphere filled with whaling artifacts. Ishmael's discomfort with sharing a room with an unknown harpooneer grows as he hears unsettling stories and experiences anxiety about his safety. He meets Queequeg, a tattooed stranger, whom he initially fears but eventually comes to accept, reflecting on their shared humanity. The narrative explores themes of adventure, curiosity, and the contrasts bet

## Summarizing across documents

In [48]:
from langchain_community.document_loaders import WikipediaLoader

wikipedia_loader = WikipediaLoader(query="Paestum", load_max_docs=2)
wikipedia_docs = wikipedia_loader.load()

In [49]:
from langchain_community.document_loaders import Docx2txtLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import TextLoader

word_loader = Docx2txtLoader("Paestum/Paestum-Britannica.docx")
word_docs = word_loader.load()

pdf_loader = PyPDFLoader("Paestum/PaestumRevisited.pdf")
pdf_docs = pdf_loader.load()

txt_loader = TextLoader("Paestum/Paestum-Encyclopedia.txt")
txt_docs = txt_loader.load()

In [50]:
all_docs = wikipedia_docs + word_docs + pdf_docs + txt_docs

In [51]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
import getpass

In [52]:
OPENAI_API_KEY = getpass.getpass('Enter your OPENAI_API_KEY')

Enter your OPENAI_API_KEY ········


In [53]:
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,model_name="gpt-4o-mini")

In [54]:
doc_summary_template = """Write a concise summary of the following text:
{text}
DOC SUMMARY:"""
doc_summary_prompt = PromptTemplate.from_template(doc_summary_template)

doc_summary_chain = doc_summary_prompt | llm

In [55]:
refine_summary_template = """
Your must produce a final summary from the current refined summary
which has been generated so far and from the content of an additional document.
This is the current refined summary generated so far: {current_refined_summary}
This is the content of the additional document: {text}
Only use the content of the additional document if it is useful, 
otherwise return the current full summary as it is."""

refine_summary_prompt = PromptTemplate.from_template(refine_summary_template)

refine_chain = refine_summary_prompt | llm | StrOutputParser()

In [56]:
def refine_summary(docs):

    intermediate_steps = []
    current_refined_summary = ''
    for doc in docs:
        intermediate_step = \
           {"current_refined_summary": current_refined_summary, 
            "text": doc.page_content}
        intermediate_steps.append(intermediate_step)
        
        current_refined_summary = refine_chain.invoke(intermediate_step)
        
    return {"final_summary": current_refined_summary,
            "intermediate_steps": intermediate_steps}

In [57]:
full_summary = refine_summary(all_docs)
print(full_summary)

{'final_summary': "**Final Summary:**\n\nPaestum, an ancient Greek city located on the coast of the Tyrrhenian Sea in Magna Graecia, was established around 600 BC by settlers from Sybaris and originally named Poseidonia. The city flourished as a Greek settlement for approximately two centuries, enjoying the status of an autonomous Greek polis with a defensive wall and four gates, likely built in phases. Notable structures include three Doric-style temples erected in the sixth and fifth centuries, traditionally referred to as dedicated to Hera I, Hera II, and Athena, although the exact deities may vary. The temples are notable for their exaggerated entasis and wide squat capitals, characteristics that have influenced Neo-Classical architecture in the 18th and 19th centuries. The city featured a Greek agora, complete with a bouleuterion or possible ekklesiasterion, and a heroon dedicated to the city's founder.\n\nPoseidonia thrived due to intense cultural and commercial exchange with bot