# 06. MapReduce + Custom Prompt Text Summarization Using Langchain

Usage: Large Document Summarization

Concept:
1. Convert large document into chunk
2. Send the chunk to llm model
3. For every chunk we will get the summary
4. Create a custom prompt
4. llm model will combine the summary for each chunk and provide the final summary based on prompt

In [1]:
import os
from dotenv import load_dotenv
import os

# Load variables from .env
load_dotenv()

True

In [2]:
openai_api_key = os.environ['OPENAI_API_KEY']

In [3]:
from PyPDF2 import PdfReader

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [4]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('document/SPEECH-YAB-PM-FOR-CCOE.pdf')

In [5]:
from typing_extensions import Concatenate

# read text from pdf
text:str = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [6]:
text

" \n1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   \n \n2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   7. Representatives of the industry and ecosystem,   8. Distinguished guests,    9. Members of the media,  10. Ladies and gentlemen,          \n3  Assalamualaikum Warahmatullahi Wabarakatuh and Salam Malaysia MADANI.  1. It has

In [7]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [8]:
llm.get_num_tokens(text)

1124

In [9]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

chunks = text_splitter.create_documents([text])

In [10]:
len(chunks)

8

In [11]:
# This prompt only applied to each chunk!
chunks_prompt = """
'Write a concise and short summary of the following speech in a professional tone.
Speech: `{text}`
Summary:
"""

map_prompt_template = PromptTemplate(input_variables=["text"], template=chunks_prompt)

In [12]:
# This prompt will apply to all combined chunks
final_combine_prompt = """
Provide a final summary of the entire speech with these important points.
Add a Generic Motivational Title,
Start the precise summary with an introduction and provide the summary in number points for the speech.
Provide the statistical value of the speech.
Speech: `{text}`
"""

combine_prompt_template = PromptTemplate(input_variables=["text"], template=final_combine_prompt)

In [13]:
# load the entire document as stuff
chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt = map_prompt_template, combine_prompt = combine_prompt_template, verbose=True)

In [14]:
summary = chain.run(chunks)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
'Write a concise and short summary of the following speech in a professional tone.
Speech: `1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   
 
2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   

In [15]:
summary

'Title: "Empowering Cybersecurity: Building a Resilient Future"\n\nIntroduction:\nPrime Minister Anwar Ibrahim of Malaysia delivered a speech at the grand opening of the Cybersecurity Center of Excellence (CCoE) in Cyberjaya, emphasizing the importance of cybersecurity in today\'s digital age and the role of the CCoE in enhancing Malaysia\'s cybersecurity capabilities.\n\nSummary:\n1. The establishment of the Cybersecurity Center of Excellence (CCoE) marks a significant step towards strengthening Malaysia\'s cybersecurity capabilities, aiming to train and upskill the national workforce and become an international hub for addressing emerging cyber threats.\n2. Intelligence sharing and collaboration are crucial in strengthening national and regional cyber-resilience, with the CCoE fostering partnerships between the public and private sectors to enhance cybersecurity through information sharing and expertise exchange.\n3. Malaysia faces a shortage of cybersecurity professionals, with effo