# 04. MapReduce Text Summarization Using Langchain

Usage: Large Document Summarization

Concept:
1. Convert large document into chunk
2. Send the chunk to llm model
3. For every chunk we will get the summary
4. llm model will combine the summary for each chunk and provide the final summary

In [1]:
import os
from dotenv import load_dotenv
import os

# Load variables from .env
load_dotenv()

True

In [2]:
openai_api_key = os.environ['OPENAI_API_KEY']

In [5]:
from PyPDF2 import PdfReader

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [6]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('document/SPEECH-YAB-PM-FOR-CCOE.pdf')

In [7]:
from typing_extensions import Concatenate

# read text from pdf
text:str = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [8]:
text

" \n1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   \n \n2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   7. Representatives of the industry and ecosystem,   8. Distinguished guests,    9. Members of the media,  10. Ladies and gentlemen,          \n3  Assalamualaikum Warahmatullahi Wabarakatuh and Salam Malaysia MADANI.  1. It has

In [10]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [11]:
llm.get_num_tokens(text)

1124

In [14]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

chunks = text_splitter.create_documents([text])

In [15]:
len(chunks)

8

In [16]:
# load the entire document as stuff
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)

In [17]:
summary = chain.run(chunks)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   
 
2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   7. Representatives of the industry and ecosystem

In [18]:
summary

'Prime Minister Anwar Ibrahim of Malaysia delivered a speech at the opening of the Cybersecurity Center of Excellence in Cyberjaya, attended by dignitaries and industry representatives. The center aims to enhance cybersecurity capabilities through collaboration with Canada, address the shortage of cybersecurity professionals, and promote regional cyber-resilience. Strategic collaborations and government efforts have attracted Foreign Direct Investment and stimulated economic growth, with a focus on advanced technologies like AI and Machine Learning. The center has graduated its first cohort of cyber defenders, contributing to cybersecurity skills and innovation.'