# Different way of Text Summarization using Langchain and Gemini

#### Set API key

In [195]:
import os
google_api_key = "{Google_API_Key}"
os.environ["GOOGLE_API_KEY"] = google_api_key

#### Installation

Install LangChain's Python library, langchain and langchain_community.

In [196]:
!pip install --quiet langchain langchain_community

Install LangChain's integration package for Gemini, langchain-google-genai.

In [197]:
!pip install --quiet langchain-google-genai

### Basic Text Summarization

In [198]:
input_text = """
Today, AI and ML are at the forefront of change and driving a technological renaissance, redefining how humanity approaches its greatest challenges. The rapid progress has made AI an indispensable tool for global transformation. As NASSCOM and BCG forecast, India's AI market is on a meteoric rise, expected to reach $17 billion by 2027, becoming a key driver of economic growth and innovation.

However, there are challenges. India faces a 51% demand-supply gap in AI and ML professionals, with a demand of over 629,000 specialists but a supply of only 416,000. This gap presents a tremendous opportunity with 10x more projected jobs and 4x more higher salaries available to those who upskill in AI and ML. Recognising this demand, Emeritus and IITM Pravartak (the technological hub of IIT Madras) have launched the Advanced Certificate Programme in Applied Artificial Intelligence and Machine Learning.

This programme is tailored for professionals eager to leverage cutting-edge technologies to drive innovation and tackle complex problems. This unique programme offers participants a competitive edge in the industry, featuring rigorous instruction from distinguished IIT faculty and globally renowned AI and ML experts. The faculty lineup includes Professor C. Chandra Sekhar, former Head of the CSE Department at IIT Madras (2019–22), and Professor Dileep A.D., Head of the CSE Department at IIT Dharwad and a research scholar at IIT Madras. These award-winning educators have published numerous research papers in prestigious international journals, making them ideal mentors for this transformative journey. The programme further combines IIT faculty teachings with live sessions led by industry experts, delving into real-world applications and tools. Participants will also engage in hands-on labs designed to cultivate practical expertise, ensuring that they graduate as well-rounded AI and ML professionals ready to meet the demands of this rapidly growing field.
"""

In [199]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage

# Instantiate Model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.7, top_p=0.85)

In [200]:
# check total tokens
llm.get_num_tokens(input_text)

393

In [201]:
summary = llm([
    SystemMessage(role="system", content="You are a skilled assistant with expertise in summarizing text"),
    HumanMessage(role="user", content=f'Please provide a concise summary of the following \n TEXT: {input_text}')
])

In [202]:
# check model response
summary

AIMessage(content="AI and machine learning are transforming industries, and India's AI market is projected to reach $17 billion by 2027.  However, a significant skills gap exists. To address this, Emeritus and IITM Pravartak have launched an Advanced Certificate Programme in Applied AI and ML.  The program, taught by renowned IIT faculty and industry experts, offers rigorous training and hands-on experience to equip professionals with the skills needed to succeed in this burgeoning field.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run-36336c6d-0435-4d7f-bb16-a50434f76932-0', usage_metadata={'input_tokens': 416, 'output_tokens': 97, 'total_tokens': 513, 'input_token_details': {'cache_read': 0}})

In [203]:
# get content
print(summary.content)

AI and machine learning are transforming industries, and India's AI market is projected to reach $17 billion by 2027.  However, a significant skills gap exists. To address this, Emeritus and IITM Pravartak have launched an Advanced Certificate Programme in Applied AI and ML.  The program, taught by renowned IIT faculty and industry experts, offers rigorous training and hands-on experience to equip professionals with the skills needed to succeed in this burgeoning field.


### Prompt Template Text Summarization

In [204]:
from langchain.chains import LLMChain
from langchain import PromptTemplate

In [205]:
input_text = """
Today, AI and ML are at the forefront of change and driving a technological renaissance, redefining how humanity approaches its greatest challenges. The rapid progress has made AI an indispensable tool for global transformation. As NASSCOM and BCG forecast, India's AI market is on a meteoric rise, expected to reach $17 billion by 2027, becoming a key driver of economic growth and innovation.

However, there are challenges. India faces a 51% demand-supply gap in AI and ML professionals, with a demand of over 629,000 specialists but a supply of only 416,000. This gap presents a tremendous opportunity with 10x more projected jobs and 4x more higher salaries available to those who upskill in AI and ML. Recognising this demand, Emeritus and IITM Pravartak (the technological hub of IIT Madras) have launched the Advanced Certificate Programme in Applied Artificial Intelligence and Machine Learning.

This programme is tailored for professionals eager to leverage cutting-edge technologies to drive innovation and tackle complex problems. This unique programme offers participants a competitive edge in the industry, featuring rigorous instruction from distinguished IIT faculty and globally renowned AI and ML experts. The faculty lineup includes Professor C. Chandra Sekhar, former Head of the CSE Department at IIT Madras (2019–22), and Professor Dileep A.D., Head of the CSE Department at IIT Dharwad and a research scholar at IIT Madras. These award-winning educators have published numerous research papers in prestigious international journals, making them ideal mentors for this transformative journey. The programme further combines IIT faculty teachings with live sessions led by industry experts, delving into real-world applications and tools. Participants will also engage in hands-on labs designed to cultivate practical expertise, ensuring that they graduate as well-rounded AI and ML professionals ready to meet the demands of this rapidly growing field.
"""

In [214]:
# Define prompt template and Instantiate prompt
template="""
Write a concise summary of the following text : `{input_text}`
Translate the summary in {language}.
"""
prompt=PromptTemplate(
    input_variables=['input_text','language'],
    template=template
)
prompt.format(input_text=input_text,language='French')

"\nWrite a concise summary of the following text : `\nToday, AI and ML are at the forefront of change and driving a technological renaissance, redefining how humanity approaches its greatest challenges. The rapid progress has made AI an indispensable tool for global transformation. As NASSCOM and BCG forecast, India's AI market is on a meteoric rise, expected to reach $17 billion by 2027, becoming a key driver of economic growth and innovation.\n\nHowever, there are challenges. India faces a 51% demand-supply gap in AI and ML professionals, with a demand of over 629,000 specialists but a supply of only 416,000. This gap presents a tremendous opportunity with 10x more projected jobs and 4x more higher salaries available to those who upskill in AI and ML. Recognising this demand, Emeritus and IITM Pravartak (the technological hub of IIT Madras) have launched the Advanced Certificate Programme in Applied Artificial Intelligence and Machine Learning.\n\nThis programme is tailored for profe

In [215]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Instantiate model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.7, top_p=0.85)

In [216]:
# check total tokens
llm.get_num_tokens(final_prompt)

412

In [217]:
llm_chain=LLMChain(llm=llm,prompt=prompt)
summary=llm_chain.invoke({'input_text':input_text,'language':'Fench'})

In [218]:
# check model response
summary

{'input_text': "\nToday, AI and ML are at the forefront of change and driving a technological renaissance, redefining how humanity approaches its greatest challenges. The rapid progress has made AI an indispensable tool for global transformation. As NASSCOM and BCG forecast, India's AI market is on a meteoric rise, expected to reach $17 billion by 2027, becoming a key driver of economic growth and innovation.\n\nHowever, there are challenges. India faces a 51% demand-supply gap in AI and ML professionals, with a demand of over 629,000 specialists but a supply of only 416,000. This gap presents a tremendous opportunity with 10x more projected jobs and 4x more higher salaries available to those who upskill in AI and ML. Recognising this demand, Emeritus and IITM Pravartak (the technological hub of IIT Madras) have launched the Advanced Certificate Programme in Applied Artificial Intelligence and Machine Learning.\n\nThis programme is tailored for professionals eager to leverage cutting-e

In [219]:
# get content
summary["text"]

"AI and machine learning are transforming industries, and India's AI market is projected to reach $17 billion by 2027.  However, a significant skills gap exists, with demand for AI/ML professionals far exceeding supply. To address this, Emeritus and IITM Pravartak have launched an advanced certificate program in applied AI and ML, taught by renowned IIT faculty and industry experts, offering rigorous training and practical experience to prepare professionals for this burgeoning field.\n\n\nFrench Translation:\n\nL'intelligence artificielle et l'apprentissage automatique transforment les industries, et le marché indien de l'IA devrait atteindre 17 milliards de dollars d'ici 2027. Cependant, il existe un écart important en matière de compétences, la demande de professionnels de l'IA/ML dépassant largement l'offre. Pour y remédier, Emeritus et IITM Pravartak ont lancé un programme de certificat avancé en IA et ML appliquées, dispensé par des professeurs renommés de l'IIT et des experts de

### StuffDocumentChain Text Summarization

combines all documents into one prompt

In [220]:
!pip install --quiet PyPDF2

In [221]:
from PyPDF2 import PdfReader
# path of  pdf file.
pdfreader = PdfReader('pdf-sample.pdf')

In [222]:
# read text from pdf file
text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [223]:
text

"Adobe Acrobat PDF Files\nAdobe® Portable Document Format (PDF) is a universal file format that preserves all\nof the fonts, formatting, colours and graphics of any source document, regardless ofthe application and platform used to create it.\nAdobe PDF is an ideal format for electronic document distribution as it overcomes the\nproblems commonly encountered with electronic file sharing.\n• Anyone, anywhere  can open a PDF file. All you need is the free Adobe Acrobat\nReader. Recipients of other file formats sometimes can't open files because they\ndon't have the applications used to create the documents.\n• PDF files always print correctly  on any printing device.\n• PDF files always display exactly  as created, regardless of fonts, software, and\noperating systems. Fonts, and graphics are not lost due to platform, software, and\nversion incompatibilities.\n• The free Acrobat Reader is easy to download and can be freely distributed by\nanyone.\n• Compact PDF files are smaller than the

In [224]:
from langchain.docstore.document import Document
# Load document
docs = [Document(page_content=text)]
docs

[Document(metadata={}, page_content="Adobe Acrobat PDF Files\nAdobe® Portable Document Format (PDF) is a universal file format that preserves all\nof the fonts, formatting, colours and graphics of any source document, regardless ofthe application and platform used to create it.\nAdobe PDF is an ideal format for electronic document distribution as it overcomes the\nproblems commonly encountered with electronic file sharing.\n• Anyone, anywhere  can open a PDF file. All you need is the free Adobe Acrobat\nReader. Recipients of other file formats sometimes can't open files because they\ndon't have the applications used to create the documents.\n• PDF files always print correctly  on any printing device.\n• PDF files always display exactly  as created, regardless of fonts, software, and\noperating systems. Fonts, and graphics are not lost due to platform, software, and\nversion incompatibilities.\n• The free Acrobat Reader is easy to download and can be freely distributed by\nanyone.\n• Co

In [225]:
# Instantiate model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.7, top_p=0.85)

In [226]:
# Check total tokens
llm.get_num_tokens(text)

218

In [227]:
# Define template and Instantiate prompt
template = "Write a concise summary of the following: `{text}`"

prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [228]:
# Instantiate chain
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
summary = chain.invoke(docs)

In [229]:
# Model response
summary

{'input_documents': [Document(metadata={}, page_content="Adobe Acrobat PDF Files\nAdobe® Portable Document Format (PDF) is a universal file format that preserves all\nof the fonts, formatting, colours and graphics of any source document, regardless ofthe application and platform used to create it.\nAdobe PDF is an ideal format for electronic document distribution as it overcomes the\nproblems commonly encountered with electronic file sharing.\n• Anyone, anywhere  can open a PDF file. All you need is the free Adobe Acrobat\nReader. Recipients of other file formats sometimes can't open files because they\ndon't have the applications used to create the documents.\n• PDF files always print correctly  on any printing device.\n• PDF files always display exactly  as created, regardless of fonts, software, and\noperating systems. Fonts, and graphics are not lost due to platform, software, and\nversion incompatibilities.\n• The free Acrobat Reader is easy to download and can be freely distribut

In [230]:
# Summary
summary["output_text"]

"Adobe PDF is a universal file format that preserves a document's original formatting across different platforms and software.  It allows anyone with the free Acrobat Reader to view and print files exactly as intended, regardless of the original creation software.  PDFs are also compact for efficient web sharing."

### Map Reduce Large Documents Text Summarization

processes & summarize each document individually, and then aggregates the results

In [254]:
from PyPDF2 import PdfReader
# provide the path of  pdf file.
pdfreader = PdfReader('gemini.pdf')
# read text from pdf file
text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

In [255]:
text

'Gemini (language model)\nGemini (formerly known as Bard) is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, \nGemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI\'s GPT-4. It powers the chatbot of the same name. \nHistory\nDevelopment\nGoogle announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor \nto PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in \nthat it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, including text, images, audio, video, and computer code.\n[

In [256]:
from langchain_google_genai import ChatGoogleGenerativeAI
# Instantiate model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.7, top_p=0.85)

In [257]:
# Get total tokens
llm.get_num_tokens(text)

2362

In [258]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Splittting the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=20)
chunks = text_splitter.create_documents([text])

In [259]:
len(chunks)

1

In [260]:
from langchain.chains.summarize import load_summarize_chain
# Instantiate chain
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
summary = chain.invoke(chunks)

In [261]:
summary

{'input_documents': [Document(metadata={}, page_content='Gemini (language model)\nGemini (formerly known as Bard) is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, \nGemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI\'s GPT-4. It powers the chatbot of the same name. \nHistory\nDevelopment\nGoogle announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor \nto PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in \nthat it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, inc

In [262]:
summary["output_text"]

"Google DeepMind's Gemini, a multimodal LLM family succeeding LaMDA and PaLM 2, launched in late 2023 with various versions (Ultra, Pro, Nano, Flash).  It rivals GPT-4, powers Google products, and excels in multimodal processing, outperforming competitors on key benchmarks.  Development continues with improved versions and open-source Gemma models."

### Map Reduce With Custom Prompts

In [263]:
# Define chunk template and Instantiate prompt
template="""
Please summarize the following:\\n`{text}'
Summary:
"""
prompt_template=PromptTemplate(input_variables=['text'],
                                    template=template)

In [264]:
# Define final template and Instantiate prompt
final_template="""
Please provide a final summary of the text with these important points.
Add a Title,
Start the summary with an introduction 
Provide the summary in number points for the text.
Summary: `{text}`
"""
final_prompt_template=PromptTemplate(input_variables=['text'],
                                             template=final_template)

In [267]:
from langchain.chains.summarize import load_summarize_chain
# Instantiate chain
chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=prompt_template,
    combine_prompt=final_prompt_template,
    verbose=False
)
summary = chain.invoke(chunks)

In [268]:
# Model response
summary

{'input_documents': [Document(metadata={}, page_content='Gemini (language model)\nGemini (formerly known as Bard) is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, \nGemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI\'s GPT-4. It powers the chatbot of the same name. \nHistory\nDevelopment\nGoogle announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor \nto PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in \nthat it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, inc

In [270]:
# Print summary
print(summary["output_text"])

Title: Gemini: Google's Multimodal LLM Powerhouse

Introduction: Gemini is Google's answer to the rapidly evolving landscape of large language models (LLMs). Developed by Google DeepMind, this family of multimodal models aims to surpass its predecessors and compete with leading models like OpenAI's GPT-4.  Since its launch in December 2023, Gemini has undergone significant development, showcasing impressive capabilities and continuous improvements.

Summary:

1. **Multimodal Mastery:** Gemini's core strength lies in its multimodal processing, handling text, images, audio, video, and code. This allows for richer and more versatile interactions compared to text-only models.

2. **Scalable Sizes:**  Gemini is available in various sizes, catering to different needs and resource constraints. These include Ultra (the most powerful), Pro, Flash, and Nano (designed for on-device applications).

3. **Collaborative Development:** Gemini's development benefited from the combined expertise of Deep

### RefineChain Text Summarization

iteratively improves an answer by feeding back previous results to the model with new context

In [271]:
from langchain.chains.summarize import load_summarize_chain
# Instantiate chain
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
summary = chain.invoke(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Gemini (language model)
Gemini (formerly known as Bard) is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, 
Gemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name. 
History
Development
Google announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor 
to PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in 
that it was not trained on a text

In [192]:
summary

{'input_documents': [Document(metadata={}, page_content='Gemini (language model)\nGemini (formerly known as Bard) is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, \nGemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI\'s GPT-4. It powers the chatbot of the same name. \nHistory\nDevelopment\nGoogle announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor \nto PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in \nthat it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, inc

In [193]:
print(summary["output_text"])

The Rise of Gemini: Google's Multimodal LLM

Introduction:
Gemini represents Google's ambitious foray into the realm of multimodal large language models (LLMs), aiming to surpass competitors like OpenAI's GPT-4. This summary chronicles Gemini's development, from its initial announcement to its current iteration, highlighting key milestones and features.

1. Initial Promise (May 2023):
   - Announced as the successor to PaLM 2, boasting multimodal capabilities (text, images, audio, video, code).
   - Jointly developed by the newly merged DeepMind and Google Brain.
   - Positioned as a more powerful and versatile LLM.

2. Building Anticipation (August-November 2023):
   - Targeted a late 2023 launch, emphasizing combined text and image generation.
   - Sergey Brin's involvement highlighted the project's importance.
   - Training data included YouTube transcripts, necessitating legal review.
   - OpenAI responded by accelerating its own multimodal development.
   - Early access granted to

### Refine with Custom template

In [287]:
prompt_template = """Please summary of the following:
{text}
Summary:"""
prompt = PromptTemplate.from_template(prompt_template)

refine_template = ("""
    Please provide a final summary of the text with these important points
    Summary: `{text}`
    """
)
refine_prompt = PromptTemplate.from_template(refine_template)
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
summary = chain.invoke(chunks)

In [288]:
print(summary["output_text"])

Gemini is Google's family of multimodal large language models (LLMs), developed by Google DeepMind to succeed LaMDA and PaLM 2.  Launched in December 2023, it competes with OpenAI's GPT-4 and powers Google's chatbot (also named Gemini).  Key features include its multimodal capabilities (processing text, images, audio, video, and code), its development by the merged Google DeepMind team, and its training on a massive, diverse dataset including YouTube transcripts.

Gemini has several versions: Ultra (most powerful), Pro (wide range of tasks), Flash (faster, on-device tasks), and Nano (for smartphones).  It has achieved benchmark performance exceeding GPT-4 and other LLMs, even surpassing human experts on the MMLU test.  Gemini has been integrated into various Google products like Bard, Pixel phones, Search, and Google Cloud services.

The model has seen continuous updates, with Gemini 1.5 introducing a larger context window and improved architecture. Google also released Gemma, an open-