# Summarization using LLMs

#### Developed By: Manaranjan Pradhan
#### www.manaranjanp.com

*This Jupyter notebook is confidential and proprietary to Manaranjan Pradhan. It is intended solely for authorized training purposes. Unauthorized distribution, sharing, or reproduction of this notebook or its contents is strictly prohibited. This material is for personal learning within the training program only and may not be used for commercial purposes or shared with others. Unauthorized use may result in disciplinary action or legal consequences. If you have received this notebook without authorization, please contact manaranjan@gmail.com immediately and delete all copies.*

In [1]:
!pip -q install langchain openai tiktoken langchain_openai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.7/54.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.0/413.0 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip show langchain

Name: langchain
Version: 0.3.17
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: aiohttp, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


# Summarization

In [3]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

Enter your OpenAI API key: ··········


### Setting up Summarization Chain

In [4]:
from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

In [5]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo',
             temperature=0.2,
             max_tokens = 256)

In [6]:
!wget https://raw.githubusercontent.com/manaranjanp/ISBNLPv1/main/datasets/gpu_shortage

--2025-02-11 13:36:45--  https://raw.githubusercontent.com/manaranjanp/ISBNLPv1/main/datasets/gpu_shortage
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6947 (6.8K) [text/plain]
Saving to: ‘gpu_shortage’


2025-02-11 13:36:45 (69.9 MB/s) - ‘gpu_shortage’ saved [6947/6947]



In [7]:
!ls -al

total 24
drwxr-xr-x 1 root root 4096 Feb 11 13:36 .
drwxr-xr-x 1 root root 4096 Feb 11 13:32 ..
drwxr-xr-x 4 root root 4096 Feb  7 14:19 .config
-rw-r--r-- 1 root root 6947 Feb 11 13:36 gpu_shortage
drwxr-xr-x 1 root root 4096 Feb  7 14:20 sample_data


In [8]:
# load the doc
with open('gpu_shortage') as f:
    gpu_shortage_essay = f.read()

In [None]:
len(gpu_shortage_essay)

6751

In [9]:
text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
texts = text_splitter.split_text(gpu_shortage_essay)

In [10]:
len(texts)

9

In [11]:
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:4]]

In [12]:
docs

[Document(metadata={}, page_content='As compute-hungry generative AI shows no signs of slowing down, which companies are getting access to Nvidia’s hard-to-come-by, ultra-expensive, high-performance computing H100 GPU for large language model (LLM) training is becoming the “top gossip” of Silicon Valley, according to Andrej Karpathy, former director of AI at Tesla and now at OpenAI.\n\nKarpathy’s comments come at a moment where issues related to GPU access are even being discussed in big tech annual reports: In Microsoft’s annual report released last week, the company emphasized to investors that GPUs are a “critical raw material for its fast-growing cloud business” and added language about GPUs to a “risk factor for outages that can arise if it can’t get the infrastructure it needs.”'),
 Document(metadata={}, page_content='Karpathy took to the social network X (formerly Twitter) to re-share a widely circulated blog post thought to be authored by a poster on Hacker News that speculates

##  3 types of CombineDocuments Chains

[Taken from the LangChain Docs](https://langchain.readthedocs.io/en/latest/modules/indexes/combine_docs.html)

## Summarize Simple with map_reduce

In [13]:
prompt_template = """Write a concise bullet point summary of the following:

{text}

CONSCISE SUMMARY IN BULLET POINTS:"""

BULLET_POINT_PROMPT = PromptTemplate(template=prompt_template,
                        input_variables=["text"])



### Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context to pass to the language model. This is implemented in LangChain as the StuffDocumentsChain.

**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.

**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.

The main downside of this method is that **it only works one smaller pieces of data.**  Once you are working with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.



In [14]:
import textwrap

In [15]:
from langchain.chains.summarize import load_summarize_chain

In [16]:
chain = load_summarize_chain(llm,
                             chain_type="stuff",
                             prompt=BULLET_POINT_PROMPT)

output_summary = chain.invoke(docs)

In [17]:
from pprint import pprint

In [18]:
pprint(output_summary)

{'input_documents': [Document(metadata={}, page_content='As compute-hungry generative AI shows no signs of slowing down, which companies are getting access to Nvidia’s hard-to-come-by, ultra-expensive, high-performance computing H100 GPU for large language model (LLM) training is becoming the “top gossip” of Silicon Valley, according to Andrej Karpathy, former director of AI at Tesla and now at OpenAI.\n\nKarpathy’s comments come at a moment where issues related to GPU access are even being discussed in big tech annual reports: In Microsoft’s annual report released last week, the company emphasized to investors that GPUs are a “critical raw material for its fast-growing cloud business” and added language about GPUs to a “risk factor for outages that can arise if it can’t get the infrastructure it needs.”'),
                     Document(metadata={}, page_content='Karpathy took to the social network X (formerly Twitter) to re-share a widely circulated blog post thought to be authored by

In [19]:
wrapped_text = textwrap.fill(output_summary['output_text'],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Demand for Nvidia's H100 GPU for large language model training is high in Silicon Valley
- Issues
related to GPU access are being discussed in big tech annual reports
- Speculation suggests high
demand for H100 GPUs from companies like OpenAI, Meta, and cloud providers
- Estimated demand for
H100 GPUs is around 432k, totaling about $15B
- VC compares demand for GPUs to 'Game of Thrones'
battle for access
- CentML offers optimization for ML models to run on legacy hardware, increasing
chip supply
- Efforts to increase efficiency in AI inference may be more effective than training
LLMs from scratch


### map_reduce summarization with custom prompt

In [20]:
# with a custom prompt
prompt_template = """Write a concise summary of the following:

{text}

CONSCISE SUMMARY IN BULLET POINTS:"""

PROMPT = PromptTemplate(template=prompt_template,
                        input_variables=["text"])

## with intermediate steps
chain = load_summarize_chain(llm,
                             chain_type="map_reduce",
                             return_intermediate_steps=True,
                             map_prompt=PROMPT,
                             combine_prompt=PROMPT)

output_summary = chain.invoke({"input_documents": docs}, return_only_outputs=True)

In [21]:
wrapped_text = textwrap.fill(output_summary['output_text'],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Generative AI requires high-performance computing resources, with Nvidia's H100 GPU being highly
sought after in Silicon Valley
- GPU availability is a top gossip topic among tech companies and is
emphasized in Microsoft's annual report for its cloud business
- Demand for H100 GPUs is predicted
to continue until at least the end of 2024, with estimates suggesting a need for 432k GPUs worth
$15B
- Chinese companies and financial firms are also deploying large numbers of GPUs for AI
applications
- Companies like CentML and d-Matrix are working on optimizing machine learning models
for faster performance and cost savings, particularly for AI inference.


In [22]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][1],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Karpathy shared a blog post speculating about the capacity of large scale H100 clusters at cloud
providers
- The post predicts that H100 demand will continue until at least the end of 2024
- The
author estimates that OpenAI, Inflection, Meta, and big cloud providers will want a significant
number of H100s
- Private clouds like Lambda and CoreWeave may also have high demand for H100s


In [23]:
wrapped_text = textwrap.fill(output_summary['intermediate_steps'][2],
                             width=100,
                             break_long_words=False,
                             replace_whitespace=False)
print(wrapped_text)

- Estimates suggest there is a demand for approximately 432k H100 GPUs, totaling about $15B worth of
GPUs
- Chinese companies like ByteDance, Baidu, and Tencent are also expected to require a
significant number of H800 GPUs
- Financial companies such as Jane Street, JP Morgan, Two Sigma, and
Citadel are deploying hundreds to thousands of A/H100 GPUs
- The demand for GPUs for AI applications
is compared to the battle for power in 'Game of Thrones' by a venture capitalist


### With the 'refine' CombineDocument Chain

## Refine
This method involves **an initial prompt on the first chunk of data, generating some output. For the remaining documents, that output is passed in, along with the next document**, asking the LLM to refine the output based on the new document.

**Pros:** Can pull in more relevant context, and may be less lossy than MapReduceDocumentsChain.

**Cons:** Requires many more calls to the LLM than StuffDocumentsChain. The calls are also NOT independent, meaning they cannot be paralleled like MapReduceDocumentsChain. There is also some potential dependencies on the ordering of the documents.

In [24]:
chain = load_summarize_chain(llm, chain_type="refine")

output_summary = chain.invoke(docs)

In [25]:
wrapped_text = textwrap.fill(output_summary['output_text'], width=100)
print(wrapped_text)

The demand for Nvidia's high-performance computing H100 GPU for large language model training is
increasing among companies, leading to competition and speculation in Silicon Valley. Issues related
to GPU access are being highlighted in big tech annual reports, with companies like Microsoft
emphasizing the importance of GPUs for their cloud business and the potential risks of
infrastructure shortages. Speculation on the capacity of large-scale H100 clusters at cloud
providers and the projected demand from various companies, including OpenAI, Inflection, Meta, and
others, suggests that the demand for H100 GPUs will continue to rise until at least the end of 2024.
Estimates suggest a potential need for 432k H100 GPUs, valued at approximately $15B, with additional
demand expected from Chinese companies like ByteDance, Baidu, and Tencent, as well as financial
institutions like Jane Street, JP Morgan, Two Sigma, and Citadel. The competition for access to AI
chips is likened to the battles i