# 05. RefineChain + Custom Prompt Text Summarization Using Langchain

Usage: Large Document Summarization

Refine Chain workflow:
1. Convert large document into chunk  
2. Send the chunk to llm model  
3. For every chunk we will get the summary  
4. First chunk summary + 2nd chunk will be send to the llm back to produce 2nd chunk summary  
5. The 2nd chunk summary will combine with 3rd chunk text and will be send to llm to produce 3rd chunk summary. AND so on.  
6. Output the summary based on custom prompt template

In [4]:
import os
from dotenv import load_dotenv
import os

# Load variables from .env
load_dotenv()

True

In [5]:
openai_api_key = os.environ['OPENAI_API_KEY']

In [6]:
from PyPDF2 import PdfReader

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [7]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('document/SPEECH-YAB-PM-FOR-CCOE.pdf')

In [8]:
from typing_extensions import Concatenate

# read text from pdf
text:str = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [9]:
text

" \n1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   \n \n2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   7. Representatives of the industry and ecosystem,   8. Distinguished guests,    9. Members of the media,  10. Ladies and gentlemen,          \n3  Assalamualaikum Warahmatullahi Wabarakatuh and Salam Malaysia MADANI.  1. It has

In [10]:
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

In [11]:
llm.get_num_tokens(text)

1124

In [12]:
## Splittting the text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

chunks = text_splitter.create_documents([text])

In [13]:
len(chunks)

8

In [17]:
# load the entire document as stuff
chain = load_summarize_chain(llm, chain_type="refine", verbose=True)

In [18]:
summary = chain.run(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"1         SPEECH  YAB DATO’ SERI ANWAR BIN IBRAHIM PRIME MINISTER OF MALAYSIA       THE GRAND OPENING OF  CYBERSECURITY CENTER OF EXCELLENCE (CCoE)         26 MARCH 2024 (TUESDAY), 1.45 PM CCoE, CYBERJAYA   
 
2  SALUTATIONS  1. The Honourable Mrs. Mary Ng Minister of Export Promotion, International Trade and Economic Development of Canada   2. Yang Berhormat Fahmi Fadzil  Minister of Communications   3. YBhg. Datuk Mohamad Fauzi bin Md. Isa Secretary-General, Ministry of Communications  4. YM Raja Dato’ Nurshirwan Zainal Abidin Director General, National Security Council  5. Yang Berusaha Tan Sri Mohamad Salim bin Fateh Din  Chairman, Malaysian Communications and Multimedia Commission (MCMC)   6. Mr. John J. Giamatteo  Chief Executive Officer, BlackBerry   7. Representatives of the industry and ecosystem,  

In [19]:
summary

"Dato’ Seri Anwar Bin Ibrahim, Prime Minister of Malaysia, delivered a speech at the grand opening of the Cybersecurity Center of Excellence (CCoE) in Cyberjaya on March 26, 2024. The event was attended by various government officials, industry representatives, and members of the media. The establishment of the CCoE signifies Malaysia's commitment to enhancing national cybersecurity capabilities and becoming an international hub for addressing emerging cyber threats. The center aims to facilitate collaboration between Malaysian and Canadian institutions to share knowledge, threat intelligence, and jointly develop methods and strategies to strengthen national and regional cyber-resilience. The CCoE is a milestone in our journey towards creating a robust cybersecurity ecosystem, necessitating strong partnerships across the public and private sectors. Malaysia has a shortfall of 12,000 cybersecurity professionals and aims to have 25,000 workers in cybersecurity by 2025. The CCoE is envisi