# Large File Summarization using LangChain/LCEL with Bedrock API 
## GenAI Code Accelerator 
Author: Sundaresan Manoharan - Enterprise Architecture AI/ML Team
> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

Text summarization in Natural Language Processing (NLP) is the process of breaking down large texts into smaller parts. It uses deep learning and machine learning models to extract important information while preserving the meaning of the text from a text document and presenting it in a concise and coherent format. It allows digesting and distilling the essence from large volumes of content efficiently. It is a key capability of LLMs with many potential applications across industries to improve understanding and save time. This notebook demostrates text summarization using Amazon Bedrock API. 

Challenge: A key challenge is managing large documents that exceed the token limit. Another is obtaining high quality summaries. When we work with large documents, we can face some challenges as the input text might not fit into the model context length, or the model hallucinates with large documents, or, out of memory errors, etc.

To solve those problems, we are going to show an architecture that is based on the concept of chunking and chaining prompts. This architecture is leveraging LangChain which is a popular framework for developing applications powered by language models.

Use Cases:
- Books, Articles, Blogs, Research Papers

Foundation Model(s):
- Amazon Titan Large
- Meta LLaMa 13B

This notebook introduces Text Summarization using Amazon Bedrock API.  
- Uses various Foundation Models (LLM agnostic)
- Uses a PDF document (Earnings Call Transcript, Business/Financial Reports)
- Uses simple and easy to adapt bite size'd code accelerator


Insert Architecture Diagram

In this architecture:

1. A large document (or a giant file appending small ones) is loaded
1. Langchain utility is used to split it into multiple smaller chunks (chunking)
1. First chunk is sent to the model; Model returns the corresponding summary
1. Langchain gets next chunk and appends it to the returned summary and sends the combined text as a new request to the model; the process repeats until all chunks are processed
1. In the end, you have final summary based on entire content


### Install Libraries

In [1]:
!pip install --upgrade pip

[0m

In [2]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

Collecting boto3>=1.28.57
  Using cached boto3-1.34.25-py3-none-any.whl.metadata (6.6 kB)
Collecting awscli>=1.29.57
  Using cached awscli-1.32.25-py3-none-any.whl.metadata (11 kB)
Collecting botocore>=1.31.57
  Using cached botocore-1.34.25-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.28.57)
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3>=1.28.57)
  Using cached s3transfer-0.10.0-py3-none-any.whl.metadata (1.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli>=1.29.57)
  Using cached docutils-0.16-py2.py3-none-any.whl (548 kB)
Collecting PyYAML<6.1,>=3.10 (from awscli>=1.29.57)
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting colorama<0.4.5,>=0.2.5 (from awscli>=1.29.57)
  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli>=1.29.57)
  Using cached rsa-4.7.2-py3-none-any.whl (34 k

In [4]:
%pip install langchain
!pip install transformers


[0mNote: you may need to restart the kernel to use updated packages.
[0m

### Import Libraries

In [5]:
import json
import os
import sys
import pandas as pd

import boto3
import botocore
from IPython.display import display_markdown, Markdown, clear_output

from langchain.llms.bedrock import Bedrock


### Initialize boto session

In [6]:
# module_path = ".."
# sys.path.append(os.path.abspath(module_path))

boto_session = boto3.Session()
aws_region = boto_session.region_name
print(aws_region)
br_client = boto_session.client("bedrock", region_name=aws_region)
br_runtime = boto_session.client("bedrock-runtime", region_name=aws_region)


us-east-1


### Test Connection & List Foundation Models

In [7]:
fms = br_client.list_foundation_models()['modelSummaries']
dfFM = pd.DataFrame(fms)
print(dfFM.shape)
dfFM.head()

(45, 10)


Unnamed: 0,modelArn,modelId,modelName,providerName,inputModalities,outputModalities,responseStreamingSupported,customizationsSupported,inferenceTypesSupported,modelLifecycle
0,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-tg1-large,Titan Text Large,Amazon,[TEXT],[TEXT],True,[],[ON_DEMAND],{'status': 'ACTIVE'}
1,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-image-generator-v1:0,Titan Image Generator G1,Amazon,"[TEXT, IMAGE]",[IMAGE],,[FINE_TUNING],"[ON_DEMAND, PROVISIONED]",{'status': 'ACTIVE'}
2,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-image-generator-v1,Titan Image Generator G1,Amazon,"[TEXT, IMAGE]",[IMAGE],,[],[ON_DEMAND],{'status': 'ACTIVE'}
3,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-embed-g1-text-02,Titan Text Embeddings v2,Amazon,[TEXT],[EMBEDDING],,[],[ON_DEMAND],{'status': 'ACTIVE'}
4,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-text-lite-v1:0:4k,Titan Text G1 - Lite,Amazon,[TEXT],[TEXT],True,"[FINE_TUNING, CONTINUED_PRE_TRAINING]",[PROVISIONED],{'status': 'ACTIVE'}


In [8]:
dfFM.columns

Index(['modelArn', 'modelId', 'modelName', 'providerName', 'inputModalities',
       'outputModalities', 'responseStreamingSupported',
       'customizationsSupported', 'inferenceTypesSupported', 'modelLifecycle'],
      dtype='object')

In [9]:
dfFM.modelName.unique()

array(['Titan Text Large', 'Titan Image Generator G1',
       'Titan Text Embeddings v2', 'Titan Text G1 - Lite',
       'Titan Text G1 - Express', 'Titan Embeddings G1 - Text',
       'Titan Multimodal Embeddings G1', 'SDXL 0.8', 'SDXL 1.0',
       'J2 Grande Instruct', 'J2 Jumbo Instruct', 'Jurassic-2 Mid',
       'Jurassic-2 Ultra', 'Claude Instant', 'Claude', 'Command',
       'Command Light', 'Embed English', 'Embed Multilingual',
       'Llama 2 Chat 13B', 'Llama 2 Chat 70B', 'Llama 2 13B',
       'Llama 2 70B'], dtype=object)

## Summarize long text 

### Configuring LangChain with Boto3

LangChain allows you to access Bedrock once you pass boto3 session information to LangChain. If you pass None as the boto3 session information to LangChain, LangChain tries to get session information from your environment.
In order to ensure the right client is used we are going to instantiate one thanks to a utility method.

You need to specify LLM for LangChain Bedrock class, and can pass arguments for inference. Here you specify Amazon Titan Text Large in `model_id` and pass Titan's inference parameter in `textGenerationConfig`.

In [10]:
modelId = "amazon.titan-tg1-large"
llm = Bedrock(
    model_id=modelId,
    model_kwargs={
        "maxTokenCount": 4096,
        "stopSequences": [],
        "temperature": 0,
        "topP": 1,
    },
    client=br_runtime,
)

### Download a public dataset

### Loading a text file with many tokens

In letters directory, you can find a text file of Amazon's CEO letter to shareholders in 2022. The following cell loads the text file and counts the number of tokens in the file.

You will see warning indicating the number of tokens in the text file exceeeds the maximum number of tokens for this model.


In [11]:
# %%sh

# wget -O fannie-mf-commentary-oct-2023.pdf https://www.fanniemae.com/media/49331/display

shareholder_letter = "./2022-letter.txt"

with open(shareholder_letter, "r") as file:
    letter = file.read()
    
llm.get_num_tokens(letter)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Token indices sequence length is longer than the specified maximum sequence length for this model (6526 > 1024). Running this sequence through the model will result in indexing errors


6526

### Splitting the long text into chunks

The text is too long to fit in the prompt, so we will split it into smaller chunks. RecursiveCharacterTextSplitter in LangChain supports splitting long text into chunks recursively until size of each chunk becomes smaller than chunk_size. A text is separated with separators=["\n\n", "\n"] into chunks, which avoids splitting each paragraph into multiple chunks.

Using 4,000 characters per chunk, we can get summaries for each portion separately. The number of tokens, or word pieces, in a chunk depends on the text.


In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=4000, chunk_overlap=100
)

docs = text_splitter.create_documents([letter])

In [13]:
num_docs = len(docs)

num_tokens_first_doc = llm.get_num_tokens(docs[0].page_content)

print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 10 documents and the first one has 439 tokens


### Summarizing chunks and combining them

Assuming that the number of tokens is consistent in the other docs we should be good to go. Let's use LangChain's [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) to summarize the text. `load_summarize_chain` provides three ways of summarization: `stuff`, `map_reduce`, and `refine`. 
- `stuff` puts all the chunks into one prompt. Thus, this would hit the maximum limit of tokens.
- `map_reduce` summarizes each chunk, combines the summary, and summarizes the combined summary. If the combined summary is too large, it would raise error.
- `refine` summarizes the first chunk, and then summarizes the second chunk with the first summary. The same process repeats until all chunks are summarized.

`map_reduce` and `refine` invoke LLM multiple times and takes time for obtaining final summary. 
Let's try `map_reduce` here. 


### Option 1. Use Map reduce pattern on Langchain

In [14]:
# Set verbose=True if you want to see the prompts being used
from langchain.chains.summarize import load_summarize_chain

summary_chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=True)

In [15]:
%%time
output = ""

try:
    output = summary_chain.run(docs)
except ValueError as error:
    raise error


  warn_deprecated(


In [16]:
print(output)


Jeff Bezos, the CEO of Amazon, remains positive and enthusiastic about the company's future despite the challenging macroeconomic environment in 2022. Amazon has increased demand, innovated in its largest businesses, and made adjustments to its investment decisions. The company has undergone constant change in its 25 years, from being a books-only retailer to selling nearly every physical and digital retail item and building a business around technology infrastructure services in the cloud. Amazon has taken a deep look across the company, business by business, invention by invention, and asked themselves whether they had conviction about each initiative's long-term potential to drive enough revenue, operating income, free cash flow, and return on invested capital. This has led to the closure of certain businesses, the amendment of programs, and the reprioritization of resources.

Amazon has announced that corporate employees will be required to return to the office at least three days

### LangChain Expression Language (LCEL)

In [17]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import XMLOutputParser, PydanticOutputParser
from langchain.output_parsers.json import SimpleJsonOutputParser
from langchain.schema.output_parser import StrOutputParser


xml_parser = XMLOutputParser(tags=['insight'])
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Format help: {format_instructions}.
    Assistant:""",
    input_variables=["instructions","document"],
    partial_variables={"format_instructions": xml_parser.get_format_instructions()},
)

insight_chain = prompt | llm | StrOutputParser()

### Option 2. Manually process insights, then summarize¶

In [18]:
%%time
insights=[]
for i in range(len(docs)):
    insights.append(
        insight_chain.invoke({
        "instructions":"Provide Key insights from the following text",
        "document": {docs[i].page_content}
    }))

CPU times: user 92.6 ms, sys: 529 µs, total: 93.2 ms
Wall time: 5min 16s


In [20]:
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Assistant:""",
    input_variables=["instructions","document"]
)

summary_chain = prompt | llm | StrOutputParser()

In [21]:
%%time
print(summary_chain.invoke({
        "instructions":"You will be provided with multiple sets of insights. Compile and summarize these insights and provide key takeaways in one concise paragraph. Do not use the original xml tags. Just provide a paragraph with your compiled insights.",
        "document": {'\n'.join(insights)}
    }))

 Here are the key insights from the provided text:

Amazon has been using machine learning extensively for 25 years, employing it in everything from personalized ecommerce recommendations to fulfillment center pick paths, to drones for Prime Air, to Alexa, to the many machine learning services AWS offers. Large Language Models (“LLMs”) and Generative AI are core to setting Amazon up to invent in every area of our business for many decades to come. Machine learning has been a technology with high promise for several decades, but it’s only been the last five to ten years that it’s started to be used more pervasively by companies. AWS is offering the most price-performant machine learning chips in Trainium and Inferentia so small and large companies can afford to train and run their LLMs in production. LLMs and Generative AI are going to be a big deal for customers, our shareholders, and Amazon. While we have a consumer business that’s $434B in 2022, the vast majority of total market segm

### Conclusion

You have now experimented with using boto3 SDK which provides a vanilla exposure to Amazon Bedrock API. Using this API you have seen the use case of generating a summary of a Meeting and Earnings Call Transcripts using 2 different foundation models: entire output and streaming output generation.

#### Take aways
- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Amazon Titan and AI21 Labs Jurassic models.
- Change the prompts to your specific usecase and evaluate the output of different models.
- Play with the token length to understand the latency and responsiveness of the service.
- Apply different prompt engineering principles to get better outputs.

### Restart Kernel

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")