## StuffDocumentChain Text Summarization

#### “stuff” = All your documents is proceessed into a single prompt. This is the simplest approach.

In [1]:
# pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_google_genai import GoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate,PromptTemplate

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# TO JOIN ALL THE TEXT FROM ALL THE PAGES OF THE PDF DOCUMENTS

def doc_words_join(inp):
   loader = PyPDFLoader(inp)
   pages = loader.load_and_split()
   # print(len(pages)) #no of pages in the documents
   whole_doc_text = ''
   for i in range(len(pages)):
       whole_doc_text+=pages[i].page_content
   return whole_doc_text

In [3]:

allwords=doc_words_join('nepal.pdf')
print(allwords)

Nepal is a landlocked country located in South Asia, bordered by China to the north and India to the south, east, and west. I t is known for its diverse geography, 
including the Himalayas, which contain eight of the world's ten highest peaks, including Mount Everest, the highest point on Earth. Nepal is also culturally rich, with a 
blend of Hinduism and Buddhism influencing its traditions, festivals, and way of life.  
 
As of my last update in January 2022, Nepal's gross domestic product (GDP) has been steadily growing, albeit at a moderate pa ce. Agriculture, tourism, and 
remittances from Nepali workers abroad are significant contributors to the country's economy. However , Nepal faces various challenges, including political instability, 
infrastructure deficiencies, and geographical barriers that hinder economic development.  
 
Nepal's GDP growth rate has been around 6 -7% in recent years, with the service sector contributing the most to the economy, followed by agriculture and 

In [4]:
from langchain.docstore.document import Document
docs = [Document(page_content=allwords)]
print(docs)

[Document(page_content="Nepal is a landlocked country located in South Asia, bordered by China to the north and India to the south, east, and west. I t is known for its diverse geography, \nincluding the Himalayas, which contain eight of the world's ten highest peaks, including Mount Everest, the highest point on Earth. Nepal is also culturally rich, with a \nblend of Hinduism and Buddhism influencing its traditions, festivals, and way of life.  \n \nAs of my last update in January 2022, Nepal's gross domestic product (GDP) has been steadily growing, albeit at a moderate pa ce. Agriculture, tourism, and \nremittances from Nepali workers abroad are significant contributors to the country's economy. However , Nepal faces various challenges, including political instability, \ninfrastructure deficiencies, and geographical barriers that hinder economic development.  \n \nNepal's GDP growth rate has been around 6 -7% in recent years, with the service sector contributing the most to the econo

In [5]:

template = '''Write a concise and short summary of the following document.
document: `{text}`
'''
prompt = PromptTemplate(input_variables=['text'], template=template)

In [6]:
llm_model = GoogleGenerativeAI(model="gemini-pro", google_api_key="AIzaSyCtw4CptslAq4Ky3Fr9by6_YXoSZhbsZTM") 

In [9]:
chain = load_summarize_chain(
    llm=llm_model,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)

output_summary = chain.invoke(docs)

In [10]:
print(output_summary['output_text'])

Nepal is a landlocked country in South Asia known for its diverse geography, including the Himalayas, and its rich culture influenced by Hinduism and Buddhism. Its economy has been growing steadily, with agriculture, tourism, and remittances contributing significantly. However, Nepal faces challenges such as political instability, infrastructure deficiencies, and geographical barriers to economic development. Its GDP growth rate has been around 6-7% in recent years, but it remains one of the least developed countries with a relatively low GDP per capita. For the latest GDP and economic information, consult sources like the World Bank, IMF, or Nepal's Central Bureau of Statistics.


## Summarizing Large Documents Using Map Reduce No Custom Prompt


"At first, we have large documents. Then, the document contents are split into small chunks. Each chunk is sent to the language model (LLM) with a specific chunk prompt, and the respective summary is generated. All the summaries are combined together with a final prompt at last, and then sent to the LLM again to produce the final summary."


<p align="center">
  <img src="img\mapreduce.jpg" alt="Alt text" style="width: 60%; height: 60%;">
</p>


In [11]:
# pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_google_genai import GoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate,PromptTemplate

In [12]:
# TO JOIN ALL THE TEXT FROM ALL THE PAGES OF THE PDF DOCUMENTS

def doc_words_join(inp):
   loader = PyPDFLoader(inp)
   pages = loader.load_and_split()
   # print(len(pages)) #no of pages in the documents
   whole_doc_text = ''
   for i in range(len(pages)):
       whole_doc_text+=pages[i].page_content
   return whole_doc_text

In [13]:
allwords=doc_words_join('Himalayan_Peaks_Nepal.pdf')
print(allwords[100:1500])

rest (8,848m) Nepal Tourism Board
Bhrikuti Mandap Kathmandu, Nepal
P. O. Box: 11018
Fax: 977-1-4256910
Tel: 977-1-4256909, 4256229
E-mail: info@ntb.org.npPublished by :Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari
Images By  : Page - 7 - Harka Tamang
 : Page - 9, 13- Dilip Ali
 : Page - 17 - Raju Bhandari &  
 : Rest of Images by Jagadish Tiwari2011 Edition
© NTBThe information contained in this book has been outsourced from an expert writer while 
every effort has been made to ensure accuracy and reliability. However, in case of lapses 
and discrepancies, revisions and updates would be subsequently carried out in the forth -
coming issues.Himalayan Peaks of Nepal
3
Introduction/ The Eight-thousanders 
Mountains Over 8000m High 3
Mt.Everest 5
Mt.Kanchenjunga 7
Mt. Lhotse 9
Mt. Makalu 11 
Mt. Cho Oyu 13
Mt. Dhaulagiri 15
Mt. Manaslu 17
Mt. Annapurna 19
Some important Mountaineering Rules and Regulations 
Royalty for Mountai

In [14]:
llm_model = GoogleGenerativeAI(model="gemini-pro", google_api_key="AIzaSyCtw4CptslAq4Ky3Fr9by6_YXoSZhbsZTM") 

In [15]:
llm_model.get_num_tokens(allwords) #to see how many token forms there in our  document

10599

In [16]:

## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
chunks = text_splitter.create_documents([allwords])

In [17]:
print(len(chunks))

8


In [18]:
print(chunks[0].page_content)

www.welcomenepal.comNepal Tourism Board   
Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m) Nepal Tourism Board
Bhrikuti Mandap Kathmandu, Nepal
P. O. Box: 11018
Fax: 977-1-4256910
Tel: 977-1-4256909, 4256229
E-mail: info@ntb.org.npPublished by :Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari
Images By  : Page - 7 - Harka Tamang
 : Page - 9, 13- Dilip Ali
 : Page - 17 - Raju Bhandari &  
 : Rest of Images by Jagadish Tiwari2011 Edition
© NTBThe information contained in this book has been outsourced from an expert writer while 
every effort has been made to ensure accuracy and reliability. However, in case of lapses 
and discrepancies, revisions and updates would be subsequently carried out in the forth -
coming issues.Himalayan Peaks of Nepal
3
Introduction/ The Eight-thousanders 
Mountains Over 8000m High 3
Mt.Everest 5
Mt.Kanchenjunga 7
Mt. Lhotse 9
Mt. Makalu 11 
Mt. Cho Oyu 13
Mt. Dhaulagiri 15
Mt. 

In [21]:
chain = load_summarize_chain(
    llm=llm_model,
    chain_type='map_reduce',
    verbose=False
)

summary = chain.invoke(chunks)

In [22]:
print(summary['output_text'])

- Nepal is home to eight of the fourteen eight-thousanders, the highest mountains in the world.
- The Himalayas were formed around 50 million years ago and are still rising.
- The zone above 8000m is called the Death Zone due to the thin cold air.
- Since Everest's identification in 1852, it has attracted climbers worldwide.
- Kanchenjunga, Lhotse, Makalu, Cho Oyu, Dhaulagiri, Manaslu, and Annapurna I are the other eight-thousanders in Nepal.
- Mountaineering expeditions in Nepal require permits and follow strict regulations.
- Royalty fees vary depending on the mountain, season, and number of members.


# Map Reduce With Custom Prompts


"At first, we have large documents. Then, the document contents are split into small chunks. Each chunk is sent to the language model (LLM) with a our specific chunk prompt, and the respective summary is generated. All the summaries are combined together with a our custom final prompt at last, and then sent to the LLM again to produce the final summary."


<p align="center">
  <img src="img\mapreduce.jpg" alt="Alt text" style="width: 60%; height: 60%;">
</p>


In [23]:
# pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_google_genai import GoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate,PromptTemplate

In [24]:
# TO JOIN ALL THE TEXT FROM ALL THE PAGES OF THE PDF DOCUMENTS

def doc_words_join(inp):
   loader = PyPDFLoader(inp)
   pages = loader.load_and_split()
   # print(len(pages)) #no of pages in the documents
   whole_doc_text = ''
   for i in range(len(pages)):
       whole_doc_text+=pages[i].page_content
   return whole_doc_text

In [25]:
allwords=doc_words_join('Himalayan_Peaks_Nepal.pdf')
print(allwords[100:1500])

rest (8,848m) Nepal Tourism Board
Bhrikuti Mandap Kathmandu, Nepal
P. O. Box: 11018
Fax: 977-1-4256910
Tel: 977-1-4256909, 4256229
E-mail: info@ntb.org.npPublished by :Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari
Images By  : Page - 7 - Harka Tamang
 : Page - 9, 13- Dilip Ali
 : Page - 17 - Raju Bhandari &  
 : Rest of Images by Jagadish Tiwari2011 Edition
© NTBThe information contained in this book has been outsourced from an expert writer while 
every effort has been made to ensure accuracy and reliability. However, in case of lapses 
and discrepancies, revisions and updates would be subsequently carried out in the forth -
coming issues.Himalayan Peaks of Nepal
3
Introduction/ The Eight-thousanders 
Mountains Over 8000m High 3
Mt.Everest 5
Mt.Kanchenjunga 7
Mt. Lhotse 9
Mt. Makalu 11 
Mt. Cho Oyu 13
Mt. Dhaulagiri 15
Mt. Manaslu 17
Mt. Annapurna 19
Some important Mountaineering Rules and Regulations 
Royalty for Mountai

In [26]:
llm_model = GoogleGenerativeAI(model="gemini-pro", google_api_key="AIzaSyCtw4CptslAq4Ky3Fr9by6_YXoSZhbsZTM") 

In [27]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
chunks = text_splitter.create_documents([allwords])

In [28]:
chunks_prompt="""
Please summarize the below Document. The summary must be very clear with proper phrases and easy words:
Document:`{text}'
Summary:
"""
map_prompt_template=PromptTemplate(input_variables=['text'], template=chunks_prompt)


In [29]:
system = """You are a document summarizer chatbot. You are very experienced and can provide the summary in a very good way.
The documents may contain very complex words, which may be difficult to understand, so you have to use easy words while
summarizing the documents. If the document is not a textual document or you did not get any text,
display output="Sir, would you please provide a textual PDF?" and don't give random outputs.

Note: If the document has numerical values such as cost, price, rate, profit, loss, or anything, try to represent that also in the summary.

You have to follow this template while giving the summary. You have to give the summary from the user input only; don't give your opinion or content
which is not given by the user document or text.

Document: Title which may be suitable for the document (it should be at the center of the document)

Summary: Summary of the document must be at least 15 lines. Tell what the document is about.

Key Points:

a.
b.
c.
d.
e.
......... other key points.........

"""

human="{text}" # yesma sab chunk ko summary combine vayera input hunxa     # yo ra mathi ko chunk prompt ma same hunparxa input field ko name


final_combine_prompt=ChatPromptTemplate.from_messages([SystemMessagePromptTemplate.from_template(system),
                                                 HumanMessagePromptTemplate.from_template(human)])







In [30]:

summary_chain = load_summarize_chain( llm=llm_model, chain_type='map_reduce', map_prompt=map_prompt_template, combine_prompt=final_combine_prompt,
                                         verbose=False)

output = summary_chain.invoke(chunks)  


In [31]:
print(output['output_text'])

Title: The Majestic Eight-Thousanders of Nepal

Summary:
This document provides insights into the eight mountains in Nepal that stand above 8,000 meters (26,247 feet), known as the eight-thousanders. These peaks are located in the Himalayas, the youngest mountain range in the world. The eight-thousanders are Mount Everest, Kanchenjunga, Lhotse, Makalu, Cho Oyu, Dhaulagiri, Manaslu, and Annapurna I. The document also mentions the first successful expeditions to climb these peaks, the history of mountaineering in Nepal, and the regulations and fees for mountaineering expeditions.

Key Points:

a. Mount Everest, the highest peak globally, stands at 8,848 meters (29,032 feet) and was first climbed in 1953 by Edmund Hillary and Tenzing Norgay.

b. Kanchenjunga, the third highest mountain in the world, stands at 8,586 meters (28,169 feet) and was first climbed in 1955.

c. Lhotse, the fourth highest mountain in the world, stands at 8,516 meters (27,940 feet) and was first climbed in 1956.

d

## we can also use normal prompt template

In [32]:
chunks_prompt="""
Please summarize the below docs:
docs:`{text}'
Summary:
"""
map_prompt_template=PromptTemplate(input_variables=['text'],
                                    template=chunks_prompt)

In [33]:
final_combine_prompt='''
Provide a final summary of the entire docs with these important points.
Add a Generic Motivational Title,
Start the precise summary with an introduction and provide the
summary in number points for the docs.
docs: `{text}`
'''
final_combine_prompt_template=PromptTemplate(input_variables=['text'],
                                             template=final_combine_prompt)

In [34]:
summary_chain = load_summarize_chain(
    llm=llm_model,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=final_combine_prompt_template,
    verbose=False
)
output = summary_chain.invoke(chunks)

In [39]:
print(output)

{'input_documents': [Document(page_content='www.welcomenepal.comNepal Tourism Board   \nHimalayan Peaks of\nNEPAL\n(8,000 meters and above) Mt. Everest (8,848m) Nepal Tourism Board\nBhrikuti Mandap Kathmandu, Nepal\nP. O. Box: 11018\nFax: 977-1-4256910\nTel: 977-1-4256909, 4256229\nE-mail: info@ntb.org.npPublished by :Himalayan Peaks of\nNEPAL\n(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari\nImages By  : Page - 7 - Harka Tamang\n : Page - 9, 13- Dilip Ali\n : Page - 17 - Raju Bhandari &  \n : Rest of Images by Jagadish Tiwari2011 Edition\n© NTBThe information contained in this book has been outsourced from an expert writer while \nevery effort has been made to ensure accuracy and reliability. However, in case of lapses \nand discrepancies, revisions and updates would be subsequently carried out in the forth -\ncoming issues.Himalayan Peaks of Nepal\n3\nIntroduction/ The Eight-thousanders \nMountains Over 8000m High 3\nMt.Everest 5\nMt.Kanchenjunga 

## RefineChain For Summarization

<p align="center">
  <img src="img\refine.jpg" alt="Alt text" style="width: 100%; height: 100%;">
</p>


In [40]:
# pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_google_genai import GoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate,PromptTemplate

In [41]:
# TO JOIN ALL THE TEXT FROM ALL THE PAGES OF THE PDF DOCUMENTS

def doc_words_join(inp):
   loader = PyPDFLoader(inp)
   pages = loader.load_and_split()
   # print(len(pages)) #no of pages in the documents
   whole_doc_text = ''
   for i in range(len(pages)):
       whole_doc_text+=pages[i].page_content
   return whole_doc_text

In [42]:
allwords=doc_words_join('Himalayan_Peaks_Nepal.pdf')
print(allwords[100:500])

rest (8,848m) Nepal Tourism Board
Bhrikuti Mandap Kathmandu, Nepal
P. O. Box: 11018
Fax: 977-1-4256910
Tel: 977-1-4256909, 4256229
E-mail: info@ntb.org.npPublished by :Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari
Images By  : Page - 7 - Harka Tamang
 : Page - 9, 13- Dilip Ali
 : Page - 17 - Raju Bhandari &  
 : Rest of Images by Jaga


In [43]:
llm_model = GoogleGenerativeAI(model="gemini-pro", google_api_key="AIzaSyCtw4CptslAq4Ky3Fr9by6_YXoSZhbsZTM") 

In [44]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
chunks = text_splitter.create_documents([allwords])


In [45]:
print(chunks[0].page_content[0:900])

www.welcomenepal.comNepal Tourism Board   
Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m) Nepal Tourism Board
Bhrikuti Mandap Kathmandu, Nepal
P. O. Box: 11018
Fax: 977-1-4256910
Tel: 977-1-4256909, 4256229
E-mail: info@ntb.org.npPublished by :Himalayan Peaks of
NEPAL
(8,000 meters and above) Mt. Everest (8,848m)All Images User Rights, Jagadish Tiwari
Images By  : Page - 7 - Harka Tamang
 : Page - 9, 13- Dilip Ali
 : Page - 17 - Raju Bhandari &  
 : Rest of Images by Jagadish Tiwari2011 Edition
© NTBThe information contained in this book has been outsourced from an expert writer while 
every effort has been made to ensure accuracy and reliability. However, in case of lapses 
and discrepancies, revisions and updates would be subsequently carried out in the forth -
coming issues.Himalayan Peaks of Nepal
3
Introduction/ The Eight-thousanders 
Mountains Over 8000m Hig


In [46]:
print(chunks[1].page_content[0:1300])

was a mad rush to climb from the south. 
A few expeditions were permitted each 
year and it was only in May 1953 that 
Edmund Hillary and T enzing Norgay 
reached the top of Everest becoming the 
first humans to do so. The French who 
conquered the first eight-thousander 
seemed to have set the ball rolling for 
the rest of the world to try and be the 
first to climb a virgin eight-thousander. 
The mountaineers were surprisingly 
successful during the 1950s and this 
decade subsequently became know as 
the Golden Decade of Climbing. All but 
two of the fourteen eight-thousanders 
were climbed during the 1950s.Himalayan Peaks of Nepal
7
Ever since the highest peak in the world was identified by an  
employee of the Geological Survey of India in 1852, the moun -
tain has fascinated and drawn climbers from around the world. 
Known simply as Peak XV when the historic discovery was made, 
it was eventually named ‘Everest’ by the then Surveyor General, 
Andrew Waugh in honour of his predeces

<p align="center">
  <img src="img\refine.jpg" alt="Alt text" style="width: 100%; height: 100%;">
</p>


In [47]:
prompt_template = """Write a concise summary of the following:
{text}    
CONCISE SUMMARY:"""
question_prompt = PromptTemplate.from_template(prompt_template)

# Here {text} is placeholder for chunk 1.

In [48]:
refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in paragraphs with a nice heading" 
    "If the context isn't useful, return the original summary."
)
refine_prompt = PromptTemplate.from_template(template=refine_template)

# here {existing_answer} is placeholder for previous summary and {text} is placeholder for next chunk.

In [49]:

chain = load_summarize_chain(
    llm=llm_model,
    chain_type="refine",
    question_prompt=question_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
    verbose=False,
)


result = chain({"input_documents": chunks}, return_only_outputs=True)

  warn_deprecated(
Retrying langchain_google_genai.llms._completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised InternalServerError: 500 An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting.


In [50]:
print(result["output_text"].replace("*",""))


Mountaineering Royalty Fees in Nepal

The mountaineering royalty fees in Nepal vary depending on the season, the height of the peak, and the size of the expedition team. The fees are highest during the spring and autumn seasons and lowest during the winter and summer seasons. The fees are also higher for peaks that are taller than 7,000 meters.

Fees for Different Peak Heights

 Peaks below 6,500 meters: $1,000 for one member, $1,200 for two members, $1,400 for three members, $1,600 for four members, $1,800 for five members, $1,900 for six members, $2,000 for seven members, and $300 for each additional member.
 Peaks between 6,501 and 6,999 meters (excluding Mt. Ama Dablam): $1,500 for one member, $1,800 for two members, $2,100 for three members, $2,400 for four members, $2,600 for five members, $2,800 for six members, $3,000 for seven members, and $400 for each additional member.
 Peaks between 7,000 and 7,500 meters: $1,000 for one member, $1,200 for two members, $1,400 for three mem