# Summarization using LLMs

#### Developed By: Manaranjan Pradhan
#### www.manaranjanp.com

*This Jupyter notebook is confidential and proprietary to Manaranjan Pradhan. It is intended solely for authorized training purposes. Unauthorized distribution, sharing, or reproduction of this notebook or its contents is strictly prohibited. This material is for personal learning within the training program only and may not be used for commercial purposes or shared with others. Unauthorized use may result in disciplinary action or legal consequences. If you have received this notebook without authorization, please contact manaranjan@gmail.com immediately and delete all copies.*

In [None]:
!pip -q install langchain openai tiktoken langchain_openai langchain_groq

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.2 MB[0m [31m8.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m20.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip show langchain

Name: langchain
Version: 0.3.19
Summary: Building applications with LLMs through composability
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: aiohttp, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


# Summarization

In [None]:
import os
from getpass import getpass

#os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
os.environ["GROQ_API_KEY"] = getpass("Enter your Groq API key: ")

Enter your Groq API key: ··········


### Setting up Summarization Chain

In [None]:
from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_groq import ChatGroq

In [None]:
#llm = ChatOpenAI(model_name='gpt-3.5-turbo',
#             temperature=0.2,
#             max_tokens = 256)

llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0,
    max_tokens=256,
    max_retries=2,
)

In [None]:
!wget https://raw.githubusercontent.com/manaranjanp/ISBNLPv1/main/datasets/gpu_shortage

--2025-03-04 06:47:59--  https://raw.githubusercontent.com/manaranjanp/ISBNLPv1/main/datasets/gpu_shortage
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6947 (6.8K) [text/plain]
Saving to: ‘gpu_shortage’


2025-03-04 06:47:59 (67.6 MB/s) - ‘gpu_shortage’ saved [6947/6947]



In [None]:
!ls -al

total 32
drwxr-xr-x 1 root root 4096 Mar  4 06:47  .
drwxr-xr-x 1 root root 4096 Mar  4 06:25  ..
drwxr-xr-x 4 root root 4096 Feb 28 14:19  .config
-rw-r--r-- 1 root root 6947 Mar  4 06:31 'gpu shortage'
-rw-r--r-- 1 root root 6947 Mar  4 06:47  gpu_shortage
drwxr-xr-x 1 root root 4096 Feb 28 14:20  sample_data


In [None]:
# load the doc
with open('gpu_shortage') as f:
    gpu_shortage_essay = f.read()

In [None]:
len(gpu_shortage_essay)

6751

In [None]:
text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
texts = text_splitter.split_text(gpu_shortage_essay)

In [None]:
len(texts)

9

In [None]:
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:4]]

In [None]:
docs

[Document(metadata={}, page_content='As compute-hungry generative AI shows no signs of slowing down, which companies are getting access to Nvidia’s hard-to-come-by, ultra-expensive, high-performance computing H100 GPU for large language model (LLM) training is becoming the “top gossip” of Silicon Valley, according to Andrej Karpathy, former director of AI at Tesla and now at OpenAI.\n\nKarpathy’s comments come at a moment where issues related to GPU access are even being discussed in big tech annual reports: In Microsoft’s annual report released last week, the company emphasized to investors that GPUs are a “critical raw material for its fast-growing cloud business” and added language about GPUs to a “risk factor for outages that can arise if it can’t get the infrastructure it needs.”'),
 Document(metadata={}, page_content='Karpathy took to the social network X (formerly Twitter) to re-share a widely circulated blog post thought to be authored by a poster on Hacker News that speculates

##  3 types of CombineDocuments Chains

[Taken from the LangChain Docs](https://langchain.readthedocs.io/en/latest/modules/indexes/combine_docs.html)

## Summarize Simple with map_reduce

In [None]:
prompt_template = """Write a concise bullet point summary of the following:

{text}

CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(template=prompt_template,
                        input_variables=["text"])



### Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that **it only works one smaller pieces of data.**  Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.



In [None]:
import textwrap

In [None]:
from langchain.chains.summarize import load_summarize_chain

In [None]:
chain = load_summarize_chain(llm,
                             chain_type="stuff",
                             prompt=BULLET_POINT_PROMPT)

output_summary = chain.invoke(docs)

In [None]:
from pprint import pprint

In [None]:
pprint(output_summary)

{'input_documents': [Document(metadata={}, page_content='As compute-hungry generative AI shows no signs of slowing down, which companies are getting access to Nvidia’s hard-to-come-by, ultra-expensive, high-performance computing H100 GPU for large language model (LLM) training is becoming the “top gossip” of Silicon Valley, according to Andrej Karpathy, former director of AI at Tesla and now at OpenAI.\n\nKarpathy’s comments come at a moment where issues related to GPU access are even being discussed in big tech annual reports: In Microsoft’s annual report released last week, the company emphasized to investors that GPUs are a “critical raw material for its fast-growing cloud business” and added language about GPUs to a “risk factor for outages that can arise if it can’t get the infrastructure it needs.”'),
                     Document(metadata={}, page_content='Karpathy took to the social network X (formerly Twitter) to re-share a widely circulated blog post thought to be authored by

In [None]:
wrapped_text = textwrap.fill(output_summary['output_text'],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

Here is a concise summary of the article in bullet points:

* Demand for Nvidia's high-performance
H100 GPU is extremely high, particularly for large language model (LLM) training, with companies
like OpenAI, Meta, and Microsoft competing for access.
* The shortage of H100 GPUs is becoming a
major issue, with estimated demand totaling around 432,000 units, worth approximately $15 billion.
*
Companies are looking for ways to optimize their use of GPUs, with some investing in technologies
that can reduce compute costs and increase efficiency.
* The battle for access to AI chips is
likened to "Game of Thrones," with companies fighting for limited resources to power their AI
models.
* Alternative solutions, such as optimizing ML models to work on legacy hardware, may help
increase supply, but may be more effective for AI inference rather than LLM training.


### map_reduce summarization with custom prompt

In [None]:
# with a custom prompt
map_prompt_template = """Write a concise summary of the following chunk froma document:

{text}

CONSCISE SUMMARY IN BULLET POINTS:"""

reduce_prompt_template = """Summarize the summaries below. Create final summary by consolidating the summaries below in bullet point.

{text}
"""

MAP_PROMPT = PromptTemplate(template=map_prompt_template,
                        input_variables=["text"])

REDUCE_PROMPT = PromptTemplate(template=reduce_prompt_template,
                               input_variables=["text"])

## with intermediate steps
chain = load_summarize_chain(llm,
                             chain_type="map_reduce",
                             return_intermediate_steps=True,
                             map_prompt=MAP_PROMPT,
                             combine_prompt=REDUCE_PROMPT,
                             verbose = True)

output_summary = chain.invoke({"input_documents": docs}, return_only_outputs=True)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following chunk froma document:

As compute-hungry generative AI shows no signs of slowing down, which companies are getting access to Nvidia’s hard-to-come-by, ultra-expensive, high-performance computing H100 GPU for large language model (LLM) training is becoming the “top gossip” of Silicon Valley, according to Andrej Karpathy, former director of AI at Tesla and now at OpenAI.

Karpathy’s comments come at a moment where issues related to GPU access are even being discussed in big tech annual reports: In Microsoft’s annual report released last week, the company emphasized to investors that GPUs are a “critical raw material for its fast-growing cloud business” and added language about GPUs to a “risk factor for outages that can arise if it can’t get the infrastructure it needs.”

CONSCISE SUMMARY IN BULLET POINTS:[0m


In [None]:
wrapped_text = textwrap.fill(output_summary['output_text'],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

Here is a final summary consolidating the information in bullet points:

* The demand for Nvidia's
H100 GPUs is extremely high, driven by large language model training, with an estimated demand of
approximately 432,000 units, worth around $15 billion.
* Major tech companies, such as Microsoft,
OpenAI, Meta, and others, are seeking large quantities of H100 GPUs, with some estimates including:
+ OpenAI: 50,000
  + Inflection: 22,000
  + Meta: 25,000
  + Big clouds: 30,000 each
  + Private
clouds: 100,000 total
* The limited availability of H100 GPUs is a concern for these companies, with
some speculating that large-scale H100 clusters at cloud providers are running out of capacity.
*
Financial companies and others are also deploying hundreds to thousands of A100/H100 GPUs, further
driving up demand.
* To address the shortage, companies like Radical are investing in technologies
like CentML, which optimizes machine learning models to run on legacy hardware, potentially
increasing the supp

In [None]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][1],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

Here is a concise summary in bullet points:

* A blog post speculates that large-scale H100 clusters
at cloud providers are running out of capacity.
* Demand for H100s is expected to continue until at
least the end of 2024.
* Estimated H100 demands:
  + OpenAI: 50,000
  + Inflection: 22,000
  + Meta:
25,000
  + Big clouds (Azure, Google Cloud, AWS, Oracle): 30,000 each
  + Private clouds (Lambda,
CoreWeave, etc.): 100,000 total
  + Anthropic, Helsing, Mistral, and Character: 10,000 each


In [None]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][2],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

Here is a concise summary in bullet points:

* Estimated demand for H100 GPUs: approximately 432,000
units, worth around $15 billion
* This estimate excludes Chinese companies like ByteDance, Baidu,
and Tencent
* Financial companies like Jane Street, JP Morgan, and Citadel are also deploying
hundreds to thousands of A100/H100 GPUs
* Demand for AI chips is extremely high, likened to the
battle for power in "Game of Thrones" due to the insatiable appetite for compute power to run large
AI models.


### With the 'refine' CombineDocument Chain

## Refine
This method involves **an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document**, asking the LLM to refine the output based on the new document.

**Pros:** Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.

In [None]:
chain = load_summarize_chain(llm, chain_type="refine")

output_summary = chain.invoke(docs)

In [None]:
wrapped_text = textwrap.fill(output_summary['output_text'], width=100)
print(wrapped_text)

Access to Nvidia's high-performance H100 GPU is a highly sought-after and limited resource for
training large language models, with companies like Microsoft highlighting GPU availability as a
critical factor in their business operations. The demand for H100 GPUs is expected to continue until
at least the end of 2024, with various companies and cloud providers speculated to require large
quantities, including OpenAI, Inflection, Meta, and major cloud providers like Azure, Google Cloud,
and AWS, further exacerbating the limited availability of this resource. Estimates suggest that the
total demand could be around 432,000 H100 GPUs, valued at approximately $15 billion, although this
may involve double-counting and excludes additional demand from Chinese companies and financial
institutions, which are also expected to require significant quantities of H100s or similar GPUs,
such as the H800s or A100s. However, efforts to optimize machine learning models, such as those by
CentML, may help i