# Summarize Google News Results with LangChain🦜🔗, Huggingface🤗 and Serper API

## Overview

Text summarization is the process of creating a shorter version of a text document while still preserving the most important information. This can be useful for a variety of purposes, such as quickly skimming a long document, getting the gist of an article, or sharing a summary with others. LLMs can be used to create summaries of news articles, research papers, technical documents, and other types of text.

<img src="images/miztiik_text_summarization_01.png" width="50%"/>


## Chunking Strategies for LLM Applications

- **Stuffing method** - The `stuffing` method is the easiest way to summarize text by feeding the entire document to a large language model (LLM) in a single call. This method has both pros and cons.

  - **Pros**:
    - Only required a single call to the model, which can be faster than other methods that require multiple calls
    - When summarizing text, the model has access to all the data at once, which can result in a better summary.
  - **Cons**:
    - Most models have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.
    - This method only works on smaller pieces of data and not suitable to large documents most of the time.

- **MapReduce method** - It is a technique for summarizing large pieces of text by first summarizing smaller chunks of text and then combining those summaries into a single summary. The `MapReduce` method implements a multi-stage summarization. In LangChain, you can use `MapReduceDocumentsChain` as part of the `load_summarize_chain` method. What you need to do is setting `map_reduce` as `chain_type` of your chain.
  - MapReduce with Overlapping Chunks method
  - MapReduce with Rolling Summary method

  <img src="images/miztiik_automation_docs_copilot_using_llm_rag_02.png" width="50%"/>


As we do not know the length of the document, we will use the `map-reduce` method to summarize the news articles.

In this notebook, we will try fetch the latest Google news using server API and use AI-generated summaries with LangChain LLM framework or huggingface transformers.


<a href="https://colab.research.google.com/github/miztiik/llm-bootcamp/blob/main/chapters/text_summarization/news_summarization_with_hf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# Comment the above line to see the installation logs

# Install the dependencies
!pip install -qU python-dotenv
!pip install -qU langchain-core==0.1.23
!pip install -qU langchain==0.1.6
!pip install -qU langchain-community==0.0.19
!pip install -qU langchain-openai
!pip install -qU transformers --quiet
!pip install -qU newspaper3k

# langchain==0.1.6
# langchain-community==0.0.19
# langchain-core==0.1.23

In [2]:
# Load environment variables
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

In [3]:
# Not a good practice, but we will ignore warnings in this notebook, as tensor has deprecated some methods and will be removed in future versions.
# https://github.com/pytorch/pytorch/issues/97207#issuecomment-1494781560
import warnings

# warnings.filterwarnings('ignore')
warnings.filterwarnings(
    "ignore", category=UserWarning, message="TypedStorage is deprecated"
)

Update your `API_KEY` in the `.env` file. You can get the API keys from the following links. _Note: Some of the services may require you to have an account and some may charge you for usage_
- [OpenAI API Key](https://platform.openai.com/account/api-keys)
- [Hugging Face API Key](https://huggingface.co/settings/tokens)
- [Serper API Key](https://serper.dev/api-key)

In [4]:
# os.environ["HF_TOKEN"] = ""
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = ""
# os.environ["OPENAI_API_KEY"] = ""
# os.environ["SERPER_API_KEY"] = ""

In [5]:
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI

# To specify a particular model refer to the OpenAI documentation - https://platform.openai.com/docs/models
# Completions Model: https://platform.openai.com/docs/models/completions
# Chat Model: https://platform.openai.com/docs/models/completions

llm = OpenAI()
llm_chat = ChatOpenAI(model_name="gpt-3.5-turbo-0125", temperature=0.3)

In [6]:
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import NewsURLLoader
import textwrap

**Serper API** - [Sign up](https://serper.dev/signup?ref=miztiik) for an account with Serper, or log in if you already have an account, and create an API key. Serper offers a generous free tier; as you consume the API, the dashboard will populate with the requests and remaining credits.

In [7]:
import os

search = GoogleSerperAPIWrapper(
    type="news", tbs="qdr:d1", serper_api_key=os.getenv("SERPER_API_KEY")
)

news_search_query = "india ai growth"
news_results = search.results(news_search_query, num_results=5)

if news_results.get("news") is None:
    print("No results found")

In [None]:
print(f"total_no_news_articles: {len(news_results['news'])}")

# Lets take a look at one of the news item
for i in news_results["news"][0]:
    print(f"{i}:{news_results['news'][0][i]}")

In [10]:
# Limit how many news articles to process
num_results = min(5, len(news_results["news"]))

text_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n",
        "\n",
    ],
    chunk_size=1000,
    chunk_overlap=100,
)

# For each news article, load the contents
for index, news_item in enumerate(news_results["news"]):
    loader = NewsURLLoader(urls=[news_item.get("link")])
    contents = loader.load()
    if contents:
        news_results["news"][index]["article"] = contents
        # Make the docs to fit model input size
        news_results["news"][index]["split_article"] = text_splitter.create_documents(
            [contents[0].page_content]
        )
    else:
        print(f"Failed to load {news_item['link']}, removed from results.\n")
        news_results["news"].pop(index)

In [None]:
print(f"total_no_news_articles: {len(news_results['news'])}")

# List all news article links
for i in news_results["news"]:
    # print(i)
    print(i["link"])
    print(f"\033[32m-----\033[0m")

## Summarization with Open AI Models

<img src="images/miztiik_text_summarization_02.png" width="50%"/>


In [13]:
%%time
# 16k is the max input length for GPT-3.5
# num_tokens_first_doc = llm.get_num_tokens(
#     news_results["news"][1]["contents"][0].page_content
# )

map_prompt = """Identify main themes to write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """Write a succinct summary of the following text delimited by triple backquotes.
```{text}```
succinct SUMMARY:
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

oai_chain = load_summarize_chain(
    llm=llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    # Uncomment verbose=True if you want to see the prompts being used
    # verbose=True
)

for news_item in news_results["news"][:num_results]:
    if news_item.get("article"):
        print(
            f"Summarizing article: {news_item['title']} - {news_item['link']}\n")
        news_item["oai_summary"] = oai_chain.invoke(news_item["split_article"])

Summarizing article: IMF urges G20 cooperation on climate, global AI principles - https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms

Summarizing article: Micron Begins Production of HBM3E Chips to Accelerate AI Growth - https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/

Summarizing article: Demand for AI talent growing at 15 percent in India: Nasscom BCG report - https://www.storyboard18.com/how-it-works/demand-for-ai-talent-growing-at-15-percent-in-india-nasscom-bcg-report-24468.htm

Summarizing article: IIT Madras researchers develop first India-specific AI model to determine age of foetus - https://www.deccanherald.com/science/iit-madras-researchers-develop-first-india-specific-ai-model-to-determine-age-of-foetus-2910363

Summarizing article: A for artificial intelligence, B for base effect: The A-Z of Q3 earnings season - https://www.tradingview.com

Let us take a look at the summaries generated by the `map-reduce` method.

In [14]:
for i in news_results["news"][:num_results]:
    if i.get("article"):
        print(
            f"\nTitle: {i['title']}\nLink: {i['link']}\nSummary: \033[32m{i['oai_summary']['output_text']}\033[0m"
        )


Title: IMF urges G20 cooperation on climate, global AI principles
Link: https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms
Summary: [32m
The International Monetary Fund advises G20 countries to address climate change, avoid trade restrictions, and adopt AI principles to improve global growth prospects. They also recommend rebuilding fiscal buffers, broadening tax bases, and reducing inequality and promoting sustainability.[0m

Title: Micron Begins Production of HBM3E Chips to Accelerate AI Growth
Link: https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/
Summary: [32m
Micron Technology has announced the production of their HBM3E solution, offering 24GB of memory and superior performance for AI applications. It has a pin speed of 9.2 Gb/s and a memory bandwidth of 1.2 TB/s. Micron's HBM3E technology also boasts 30% lower power consumption and is ideal fo

### Observations

The summarization is reads like written by a person, does a good job of capturing the main points of the article. The summary is coherent and reads well. As OpenAI continues to improve their models, we can expect the quality of the summaries to improve as well.

## Summarization with Huggingface Open Source Hosted Models with LangChain

<img src="images/miztiik_text_summarization_03.png" width="50%"/>

We will try a variety of model and see how they perform. We will use,
- `google/flan-t5-xxl`
- `facebook/bart-large-cnn`
- `sshleifer/distilbart-cnn-12-6`
- `Falconsai/text_summarization`

### Summarization with Hugging Face hosted models

In [15]:
from langchain.chains.summarize import load_summarize_chain
from langchain.llms.huggingface_hub import HuggingFaceHub
from langchain.chains.llm import LLMChain
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf_flan_llm = HuggingFaceHub(
    repo_id="google/flan-t5-xxl",
    model_kwargs={"temperature": 0.3, "max_length": 1024},
    # repo_id="philschmid/bart-large-cnn-samsum", model_kwargs={"temperature": 0.3, "max_length": 256}
    # repo_id="mistralai/Mistral-7B-v0.1", model_kwargs={"temperature": 0.3, "max_length": 1024}
)

map_prompt = """Identify main themes to write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(
    template=map_prompt, input_variables=["text"])

combine_prompt = """Write a succinct summary of the following text delimited by triple backquotes.
```{text}```
succinct SUMMARY:
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)


hf_flan_chain = load_summarize_chain(
    llm=hf_flan_llm,
    chain_type="map_reduce",
    token_max=900,  # https://github.com/langchain-ai/langchain/discussions/10930
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=True,
)

# textwrap.fill(output_summary, width=100)

for news_item in news_results["news"][:num_results]:
    if news_item.get("article"):
        print(
            f"Summarizing article: {news_item['title']} - {news_item['link']}\n")
        news_item["hf_flan_summary"] = hf_flan_chain.invoke(
            news_item["split_article"])

Summarizing article: IMF urges G20 cooperation on climate, global AI principles - https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mIdentify main themes to write a concise summary of the following:
"SAO PAULO: Medium-term global growth prospects are the weakest in decades, but G20 major economies could boost growth prospects if they work together to address climate change , avoid trade restrictions and adopt worldwide principles for artificial intelligence (AI), the International Monetary Fund said. IMF Managing Director Kristalina Georgieva urged G20 major economies to "act boldly" to rebuild policy momentum on reforms after years of "firefighting" in the wake of economic shocks caused by the COVID-19 pandemic and the war in Ukraine.With global growth expecte

Token indices sequence length is longer than the specified maximum sequence length for this model (1061 > 1024). Running this sequence through the model will result in indexing errors



[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a succinct summary of the following text delimited by triple backquotes.
```The Q3 earnings season is done and dusted, with sectors like industrials, auto, energy and financials emerging as the outperformers. In contrast, consumer staples and technology companies posted lacklustre numbers. Some names shone brighter than others. And quite a few delivered much-needed lessons in sobriety for investors drunk on this seemingly unstoppable bull run. All in all, to borrow one of the favourite catchphrases of market experts

The aggregate earnings of BSE-500 companies were robust and resilient. While PAT (profit after tax) growth was strong at 26%, it was down sequentially (from 58% in 2Q FY24), as the positive base effect is now dwindling, according to Emkay Research.

India’s top four IT companies hired a paltry 1,940 employee

In [16]:
# Print the summaries
for i in news_results["news"][:num_results]:
    if i.get("article"):
        print(f"\nTitle: {i['title']}\nLink: {i['link']}")
        print(
            f"\noai_summary: \033[32m {i['oai_summary']['output_text']}\033[0m")
        print(
            f"\nhf_flan_summary: \033[32m{i['hf_flan_summary']['output_text']}\033[0m"
        )


Title: IMF urges G20 cooperation on climate, global AI principles
Link: https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms

oai_summary: [32m 
The International Monetary Fund advises G20 countries to address climate change, avoid trade restrictions, and adopt AI principles to improve global growth prospects. They also recommend rebuilding fiscal buffers, broadening tax bases, and reducing inequality and promoting sustainability.[0m

hf_flan_summary: [32mThe International Monetary Fund (IMF) on Wednesday urged the G20 to work together to address climate change, avoid trade restrictions and adopt worldwide principles for artificial intelligence.[0m

Title: Micron Begins Production of HBM3E Chips to Accelerate AI Growth
Link: https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/

oai_summary: [32m 
Micron Technology has announced the production of their H

### Summarization with Huggingface Open Source Local Models with LangChain

In [22]:
from transformers import BartForConditionalGeneration, BartTokenizer
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline

# Load the model and tokenizer

bart_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
bart_tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")

print(f"bart_model_max_model_length:{bart_tokenizer.model_max_length}")

bart_model_max_model_length:1024


In [36]:
%%time

bart_summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    tokenizer="facebook/bart-large-cnn",
)

hf_bart_llm = HuggingFacePipeline(pipeline=bart_summarizer, model_kwargs={})

map_prompt = """Identify main themes to write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """Write a succinct summary of the following text delimited by triple backquotes.
```{text}```
succinct SUMMARY:
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

hf_bart_chain = load_summarize_chain(
    llm=hf_bart_llm,
    chain_type="map_reduce",
    token_max=900,  # https://github.com/langchain-ai/langchain/discussions/10930
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    # verbose=True
)

# run chain
for news_item in news_results["news"][:num_results]:
    if news_item.get("article"):
        print(
            f"Summarizing article: {news_item['title']} - {news_item['link']}\n")
        news_item["hf_bart_summary"] = hf_bart_chain.invoke(
            news_item["split_article"])

Summarizing article: IMF urges G20 cooperation on climate, global AI principles - https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms



Your max_length is set to 142, but your input_length is only 101. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


Summarizing article: Micron Begins Production of HBM3E Chips to Accelerate AI Growth - https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/



Your max_length is set to 142, but your input_length is only 138. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=69)
Your max_length is set to 142, but your input_length is only 129. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=64)


Summarizing article: Demand for AI talent growing at 15 percent in India: Nasscom BCG report - https://www.storyboard18.com/how-it-works/demand-for-ai-talent-growing-at-15-percent-in-india-nasscom-bcg-report-24468.htm



Your max_length is set to 142, but your input_length is only 61. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)


Summarizing article: IIT Madras researchers develop first India-specific AI model to determine age of foetus - https://www.deccanherald.com/science/iit-madras-researchers-develop-first-india-specific-ai-model-to-determine-age-of-foetus-2910363



Your max_length is set to 142, but your input_length is only 124. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=62)


Summarizing article: A for artificial intelligence, B for base effect: The A-Z of Q3 earnings season - https://www.tradingview.com/news/moneycontrol:62dbcc7f3094b:0-a-for-artificial-intelligence-b-for-base-effect-the-a-z-of-q3-earnings-season/

CPU times: total: 2min 15s
Wall time: 4min 8s


In [27]:
# Print the summaries
for i in news_results["news"][:num_results]:
    if i.get("article"):
        print(f"\nTitle: {i['title']}\nLink: {i['link']}")
        print(
            f"\noai_summary: \033[32m {i['oai_summary']['output_text']}\033[0m")
        print(
            f"\nhf_flan_summary: \033[32m{i['hf_flan_summary']['output_text']}\033[0m"
        )
        print(
            f"\nhf_bart_summary: \033[32m{i['hf_bart_summary']['output_text']}\033[0m"
        )


Title: IMF urges G20 cooperation on climate, global AI principles
Link: https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms

oai_summary: [32m 
The International Monetary Fund advises G20 countries to address climate change, avoid trade restrictions, and adopt AI principles to improve global growth prospects. They also recommend rebuilding fiscal buffers, broadening tax bases, and reducing inequality and promoting sustainability.[0m

hf_flan_summary: [32mThe International Monetary Fund (IMF) on Wednesday urged the G20 to work together to address climate change, avoid trade restrictions and adopt worldwide principles for artificial intelligence.[0m

hf_bart_summary: [32mMedium-term global growth prospects are the weakest in decades, the IMF says. But G20 major economies could boost growth prospects if they work together. IMF Managing Director Kristalina Georgieva urges G20 to "act boldly" on re

### Summarization with Huggingface Open Source Smaller Local Models with LangChain


In [28]:
hf_distilbart_summarizer = pipeline(
    "summarization", model="sshleifer/distilbart-cnn-12-6"
)

hf_distilbart_llm = HuggingFacePipeline(
    pipeline=hf_distilbart_summarizer, model_kwargs={}
)

map_prompt = """Identify main themes to write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(
    template=map_prompt, input_variables=["text"])

combine_prompt = """Write a succinct summary of the following text delimited by triple backquotes.
```{text}```
succinct SUMMARY:
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)


hf_distilbart_chain = load_summarize_chain(
    llm=hf_distilbart_llm,
    chain_type="map_reduce",
    token_max=900,  # https://github.com/langchain-ai/langchain/discussions/10930
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    # verbose=True
)


# run chain
for news_item in news_results["news"][:num_results]:
    if news_item.get("article"):
        print(
            f"Summarizing article: {news_item['title']} - {news_item['link']}\n")
        news_item["hf_distilbart_summary"] = hf_distilbart_chain.invoke(
            news_item["split_article"]
        )

Summarizing article: IMF urges G20 cooperation on climate, global AI principles - https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms



Your max_length is set to 142, but your input_length is only 102. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=51)


Summarizing article: Micron Begins Production of HBM3E Chips to Accelerate AI Growth - https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/



Your max_length is set to 142, but your input_length is only 138. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=69)
Your max_length is set to 142, but your input_length is only 129. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=64)


Summarizing article: Demand for AI talent growing at 15 percent in India: Nasscom BCG report - https://www.storyboard18.com/how-it-works/demand-for-ai-talent-growing-at-15-percent-in-india-nasscom-bcg-report-24468.htm



Your max_length is set to 142, but your input_length is only 61. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)


Summarizing article: IIT Madras researchers develop first India-specific AI model to determine age of foetus - https://www.deccanherald.com/science/iit-madras-researchers-develop-first-india-specific-ai-model-to-determine-age-of-foetus-2910363



Your max_length is set to 142, but your input_length is only 124. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=62)


Summarizing article: A for artificial intelligence, B for base effect: The A-Z of Q3 earnings season - https://www.tradingview.com/news/moneycontrol:62dbcc7f3094b:0-a-for-artificial-intelligence-b-for-base-effect-the-a-z-of-q3-earnings-season/



In [29]:
# Print the summaries
for i in news_results["news"][:num_results]:
    if i.get("article"):
        print(f"\nTitle: {i['title']}\nLink: {i['link']}")
        print(
            f"oai_summary: \033[32m {i['oai_summary']['output_text']}\033[0m")
        print(
            f"hf_flan_summary: \033[32m{i['hf_flan_summary']['output_text']}\033[0m")
        print(
            f"hf_bart_summary: \033[32m{i['hf_bart_summary']['output_text']}\033[0m")
        print(
            f"hf_distilbart_summary: \033[32m{i['hf_distilbart_summary']['output_text']}\033[0m"
        )


Title: IMF urges G20 cooperation on climate, global AI principles
Link: https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms
oai_summary: [32m 
The International Monetary Fund advises G20 countries to address climate change, avoid trade restrictions, and adopt AI principles to improve global growth prospects. They also recommend rebuilding fiscal buffers, broadening tax bases, and reducing inequality and promoting sustainability.[0m
hf_flan_summary: [32mThe International Monetary Fund (IMF) on Wednesday urged the G20 to work together to address climate change, avoid trade restrictions and adopt worldwide principles for artificial intelligence.[0m
hf_bart_summary: [32mMedium-term global growth prospects are the weakest in decades, the IMF says. But G20 major economies could boost growth prospects if they work together. IMF Managing Director Kristalina Georgieva urges G20 to "act boldly" on refor

### Summarization with Huggingface Open Source Smaller Local Models with LangChain - Falconsai

In [30]:
%%time

hf_falconsai_summarizer = pipeline(
    "summarization", model="Falconsai/text_summarization"
)

hf_falconsai_llm = HuggingFacePipeline(
    pipeline=hf_falconsai_summarizer, model_kwargs={}
)

map_prompt = """Identify main themes to write a concise summary of the following:
"{text}"
CONCISE SUMMARY:
"""
map_prompt_template = PromptTemplate(
    template=map_prompt, input_variables=["text"])

combine_prompt = """Write a succinct summary of the following text delimited by triple backquotes.
```{text}```
succinct SUMMARY:
"""
combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

hf_falconsai_chain = load_summarize_chain(
    llm=hf_falconsai_llm,
    chain_type="map_reduce",
    token_max=900,  # https://github.com/langchain-ai/langchain/discussions/10930
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    # verbose=True
)


# run chain
for news_item in news_results["news"][:num_results]:
    if news_item.get("article"):
        print(
            f"Summarizing article: {news_item['title']} - {news_item['link']}\n")
        news_item["hf_falconsai_summary"] = hf_falconsai_chain.invoke(
            news_item["split_article"]
        )

Token indices sequence length is longer than the specified maximum sequence length for this model (662 > 512). Running this sequence through the model will result in indexing errors


Summarizing article: IMF urges G20 cooperation on climate, global AI principles - https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms



Your max_length is set to 200, but your input_length is only 112. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=56)


Summarizing article: Micron Begins Production of HBM3E Chips to Accelerate AI Growth - https://analyticsindiamag.com/micron-begins-production-of-hbm3e-chips-to-accelerate-ai-growth/



Your max_length is set to 200, but your input_length is only 154. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=77)
Your max_length is set to 200, but your input_length is only 141. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=70)
Your max_length is set to 200, but your input_length is only 132. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=66)


Summarizing article: Demand for AI talent growing at 15 percent in India: Nasscom BCG report - https://www.storyboard18.com/how-it-works/demand-for-ai-talent-growing-at-15-percent-in-india-nasscom-bcg-report-24468.htm



Your max_length is set to 200, but your input_length is only 61. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)
Your max_length is set to 200, but your input_length is only 198. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=99)


Summarizing article: IIT Madras researchers develop first India-specific AI model to determine age of foetus - https://www.deccanherald.com/science/iit-madras-researchers-develop-first-india-specific-ai-model-to-determine-age-of-foetus-2910363



Your max_length is set to 200, but your input_length is only 129. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=64)


Summarizing article: A for artificial intelligence, B for base effect: The A-Z of Q3 earnings season - https://www.tradingview.com/news/moneycontrol:62dbcc7f3094b:0-a-for-artificial-intelligence-b-for-base-effect-the-a-z-of-q3-earnings-season/



Your max_length is set to 200, but your input_length is only 185. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=92)
Your max_length is set to 200, but your input_length is only 188. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=94)


CPU times: total: 25.6 s
Wall time: 1min 31s


In [32]:
# Print the summaries
for i in news_results["news"][:num_results]:
    if i.get("article"):
        print(f"\nTitle: {i['title']}\nLink: {i['link']}")
        print(
            f"oai_summary: \033[32m {i['oai_summary']['output_text']}\033[0m")
        print(
            f"hf_flan_summary: \033[32m{i['hf_flan_summary']['output_text']}\033[0m")
        print(
            f"hf_bart_summary: \033[32m{i['hf_bart_summary']['output_text']}\033[0m")
        print(
            f"hf_distilbart_summary: \033[32m{i['hf_distilbart_summary']['output_text']}\033[0m"
        )
        print(
            f"hf_falconsai_summary: \033[32m{i['hf_falconsai_summary']['output_text']}\033[0m"
        )


Title: IMF urges G20 cooperation on climate, global AI principles
Link: https://timesofindia.indiatimes.com/world/rest-of-world/imf-urges-g20-cooperation-on-climate-global-ai-principles/articleshow/108018065.cms
oai_summary: [32m 
The International Monetary Fund advises G20 countries to address climate change, avoid trade restrictions, and adopt AI principles to improve global growth prospects. They also recommend rebuilding fiscal buffers, broadening tax bases, and reducing inequality and promoting sustainability.[0m
hf_flan_summary: [32mThe International Monetary Fund (IMF) on Wednesday urged the G20 to work together to address climate change, avoid trade restrictions and adopt worldwide principles for artificial intelligence.[0m
hf_bart_summary: [32mMedium-term global growth prospects are the weakest in decades, the IMF says. But G20 major economies could boost growth prospects if they work together. IMF Managing Director Kristalina Georgieva urges G20 to "act boldly" on refor

## Additional Reading

1. [LLM Bootcamp](https://github.com/miztiik/llm-bootcamp)
1. [Revolutionizing News Summarization](https://www.width.ai/post/revolutionizing-news-summarization-exploring-the-power-of-gpt-in-zero-shot-and-specialized-tasks)
1. [Summarizer For Any Size Document](https://www.width.ai/post/gpt3-summarizer)
1. [Langchain Summarization 1. Stuff & Map Reduce](https://python.langchain.com/docs/use_cases/summarization)
1. [Langchain Google Serper](https://python.langchain.com/docs/integrations/tools/google_serper)
1. [Hugging Face Local Pipelines](https://python.langchain.com/docs/integrations/llms/huggingface_pipelines)
1. [Chunking Strategies for LLM Applications](https://www.pinecone.io/learn/chunking-strategies/)
1. [Optimal Chunk-Size for Large Document Summarization](https://vectify.ai/blog/LargeDocumentSummarization)
1 .[4 Powerful Long Text Summarization Methods With Real Examples](https://www.width.ai/post/4-long-text-summarization-methods)

1. [5 Levels Of Summarization: Novice to Expert](https://www.youtube.com/watch?v=qaPMdcCqtWk)
1. [Generating Summaries for Large Documents with Llama2 using Hugging Face and Langchain](https://medium.com/@ankit941208/generating-summaries-for-large-documents-with-llama2-using-hugging-face-and-langchain-f7de567339d2)
1. [Py-LangChain-PDF-Summary](https://github.com/dmitrimahayana/Py-LangChain-PDF-Summary/blob/master/02_RAG_GPT4ALL.py)
1. [Langchain Text Summarization with OpenAI](https://github.com/krishnaik06/Complete-Langchain-Tutorials/blob/main/Text%20summarization/summarization.ipynb)
