# Text Summarization using Amazon Bedrock API 
## GenAI Code Accelerator 
Author: Sundaresan Manoharan - Enterprise Architecture AI/ML Team

Text summarization in Natural Language Processing (NLP) is the process of breaking down large texts into smaller parts. It uses deep learning and machine learning models to extract important information while preserving the meaning of the text from a text document and presenting it in a concise and coherent format. It allows digesting and distilling the essence from large volumes of content efficiently. It is a key capability of LLMs with many potential applications across industries to improve understanding and save time. This notebook demostrates text summarization using Amazon Bedrock API. 

Note: This approach can be used when the input text or file fits within the model context length. 

Challenge: A key challenge is managing large documents that exceed the token limit. Another is obtaining high quality summaries. We will explore an approach to address the challenge when users have large document(s) that exceed the token limit and how to measure the summarization quality in the advanced topics.

Use Cases:
- Meeting/Call Transcripts
- Policy, Legal, Government Documents
- Books, Articles, Blogs, Research Papers
- Financial Reports 

Foundation Model(s):
- Amazon Titan Large
- Meta LLaMa 13B

This notebook introduces Text Summarization using Amazon Bedrock API.  
- Uses various Foundation Models (LLM agnostic)
- Uses a PDF document (Earnings Call Transcript, Business/Financial Reports)
- Uses simple and easy to adapt bite size'd code accelerator

### Install Libraries

In [2]:
# update the pip installer
!pip install --upgrade pip

# install boto and AWS CLI library
!pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

# install PDF Reader library
!pip install PyPDF2


[0mCollecting boto3>=1.28.57
  Using cached boto3-1.34.19-py3-none-any.whl.metadata (6.6 kB)
Collecting awscli>=1.29.57
  Using cached awscli-1.32.19-py3-none-any.whl.metadata (11 kB)
Collecting botocore>=1.31.57
  Using cached botocore-1.34.19-py3-none-any.whl.metadata (5.6 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.28.57)
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3>=1.28.57)
  Using cached s3transfer-0.10.0-py3-none-any.whl.metadata (1.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli>=1.29.57)
  Using cached docutils-0.16-py2.py3-none-any.whl (548 kB)
Collecting PyYAML<6.1,>=3.10 (from awscli>=1.29.57)
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting colorama<0.4.5,>=0.2.5 (from awscli>=1.29.57)
  Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli>=1.29.57)
  Using cached rsa-4.7.2-py3-none-any.whl (

### Import Libraries

In [3]:
import json
import os
import sys
import re
import pandas as pd

import boto3
import botocore

from IPython.display import display_markdown, Markdown, clear_output
from PyPDF2 import PdfReader


### Initialize boto session

In [4]:
# module_path = ".."
# sys.path.append(os.path.abspath(module_path))

boto_session = boto3.Session()
aws_region = boto_session.region_name
print(aws_region)
br_client = boto_session.client("bedrock", region_name=aws_region)
br_runtime = boto_session.client("bedrock-runtime", region_name=aws_region)


us-east-1


### Test Connection & List Foundation Models

In [5]:
fms = br_client.list_foundation_models()['modelSummaries']
dfFM = pd.DataFrame(fms)
print(dfFM.shape)
dfFM.head()

(45, 10)


Unnamed: 0,modelArn,modelId,modelName,providerName,inputModalities,outputModalities,responseStreamingSupported,customizationsSupported,inferenceTypesSupported,modelLifecycle
0,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-tg1-large,Titan Text Large,Amazon,[TEXT],[TEXT],True,[],[ON_DEMAND],{'status': 'ACTIVE'}
1,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-image-generator-v1:0,Titan Image Generator G1,Amazon,"[TEXT, IMAGE]",[IMAGE],,[FINE_TUNING],"[ON_DEMAND, PROVISIONED]",{'status': 'ACTIVE'}
2,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-image-generator-v1,Titan Image Generator G1,Amazon,"[TEXT, IMAGE]",[IMAGE],,[],[ON_DEMAND],{'status': 'ACTIVE'}
3,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-embed-g1-text-02,Titan Text Embeddings v2,Amazon,[TEXT],[EMBEDDING],,[],[ON_DEMAND],{'status': 'ACTIVE'}
4,arn:aws:bedrock:us-east-1::foundation-model/am...,amazon.titan-text-lite-v1:0:4k,Titan Text G1 - Lite,Amazon,[TEXT],[TEXT],True,"[FINE_TUNING, CONTINUED_PRE_TRAINING]",[PROVISIONED],{'status': 'ACTIVE'}


In [6]:
dfFM.columns

Index(['modelArn', 'modelId', 'modelName', 'providerName', 'inputModalities',
       'outputModalities', 'responseStreamingSupported',
       'customizationsSupported', 'inferenceTypesSupported', 'modelLifecycle'],
      dtype='object')

In [7]:
dfFM.modelName.unique()

array(['Titan Text Large', 'Titan Image Generator G1',
       'Titan Text Embeddings v2', 'Titan Text G1 - Lite',
       'Titan Text G1 - Express', 'Titan Embeddings G1 - Text',
       'Titan Multimodal Embeddings G1', 'SDXL 0.8', 'SDXL 1.0',
       'J2 Grande Instruct', 'J2 Jumbo Instruct', 'Jurassic-2 Mid',
       'Jurassic-2 Ultra', 'Claude Instant', 'Claude', 'Command',
       'Command Light', 'Embed English', 'Embed Multilingual',
       'Llama 2 Chat 13B', 'Llama 2 Chat 70B', 'Llama 2 13B',
       'Llama 2 70B'], dtype=object)

## Text Summarization

### Download a public dataset

In [8]:
%%sh

wget -O fannie-mf-commentary-oct-2023.pdf https://www.fanniemae.com/media/49331/display

--2024-01-16 17:25:04--  https://www.fanniemae.com/media/49331/display
Resolving www.fanniemae.com (www.fanniemae.com)... 104.18.26.25, 104.18.27.25, 2606:4700::6812:1a19, ...
Connecting to www.fanniemae.com (www.fanniemae.com)|104.18.26.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 249442 (244K) [application/pdf]
Saving to: ‘fannie-mf-commentary-oct-2023.pdf’

     0K .......... .......... .......... .......... .......... 20% 33.1M 0s
    50K .......... .......... .......... .......... .......... 41% 21.4M 0s
   100K .......... .......... .......... .......... .......... 61% 7.68M 0s
   150K .......... .......... .......... .......... .......... 82%  198M 0s
   200K .......... .......... .......... .......... ...       100%  264M=0.01s

2024-01-16 17:25:04 (22.6 MB/s) - ‘fannie-mf-commentary-oct-2023.pdf’ saved [249442/249442]



### Read and Extract Text from PDF File

In [9]:
filename = 'fannie-mf-commentary-oct-2023.pdf'
reader = PdfReader(filename)
print("Total Pages:", len(reader.pages))


Total Pages: 5


In [10]:
pages = []

for idx, page in enumerate(reader.pages):
    print("Page ", idx + 1, "\n")
    text = page.extract_text(0) # 0 for orientation 90 degree upright 
    pages.append(text)
    print(text, "\n\n")

Page  1 

1Multifamily Economic and Market Commentary
OCTOBER 2023
Rising Number of Multifamily Properties Offering Concessions
Multifamily market fundamentals have softened in 2023 compared to the prior year, the result of mixed economic 
trends including slowing -but-still -positive job growth, elevated single -family housing prices keeping many renters in 
place, and continued favorable demographics. Rent growth was exceptional over the past two years, and, of course, 
unsustainable, thus 2023 has seen a substantial slowing of rent growth rates. There remains a robust pipeline of new 
apartment rental projects that are underway in the nation’s largest metros, and with recessionary concerns there has 
been a rise in the number of properties across the country offering concessions.
In the multifamily apartment rental market, concessions are incentives with an economic value for renters, such as 
periods of free rent, free utilities, or other amenities. And in a competitive market when

In [11]:
# combime extracted text from all pages
all_text = "\n".join(pages)
all_text

'1Multifamily Economic and Market Commentary\nOCTOBER 2023\nRising Number of Multifamily Properties Offering Concessions\nMultifamily market fundamentals have softened in 2023 compared to the prior year, the result of mixed economic \ntrends including slowing -but-still -positive job growth, elevated single -family housing prices keeping many renters in \nplace, and continued favorable demographics. Rent growth was exceptional over the past two years, and, of course, \nunsustainable, thus 2023 has seen a substantial slowing of rent growth rates. There remains a robust pipeline of new \napartment rental projects that are underway in the nation’s largest metros, and with recessionary concerns there has \nbeen a rise in the number of properties across the country offering concessions.\nIn the multifamily apartment rental market, concessions are incentives with an economic value for renters, such as \nperiods of free rent, free utilities, or other amenities. And in a competitive market whe

### Total Tokens Count

In [12]:
# count the number of tokens
len(re.findall(r"[\w']+", all_text))

1674

### Prompt Engineering

Prompts are a specific set of inputs provided by you, the user, that guide LLMs on Amazon Bedrock to generate an appropriate response or output for a given task or instruction. Prompt engineering refers to the practice of crafting and optimizing input prompts by selecting appropriate words, phrases, sentences, punctuation, and separator characters to effectively use LLMs for a wide variety of applications. In other words, prompt engineering is the art of communicating with an LLM. High-quality prompts condition the LLM to generate desired or better responses.

Summarization: The prompt is a passage of text, and the model must respond with a shorter passage that captures the main points of the input. 


In [13]:
# Do not add any information that is not mentioned in the text below.

prompt = f"""
Please provide a summary of the following text. 

<text>
{all_text}
</text>

"""

### LLM Inference Parameters
Inference parameters are values that you can adjust to limit or influence the model response. 

#### Randomness and Diversity
For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response.

Temperature– Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs.

        Choose a lower value to influence the model to select higher-probability outputs.
        Choose a higher value to influence the model to select lower-probability outputs.

    In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses.

Top K – The number of most-likely candidates that the model considers for the next token.

        Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
        Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

    For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence.

Top P – The percentage of most-likely candidates that the model considers for the next token.

        Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
        Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

    In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution. For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.

#### Length
Foundation models typically support parameters that limit the length of the response. 

Response length – An exact value to specify the minimum or maximum number of tokens to return in the generated response.

Penalties – Specify the degree to which to penalize outputs in a response. Examples include the following.

        The length of the response.
        Repeated tokens in a response.
        Frequency of tokens in a response.
        Types of tokens in a response.

Stop sequences – Specify sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.



### Invoke Bedrock FM API
#### Amazon Titan Large Model
Here sends the API request to Amazon Bedrock with specifying request parameters modelId, accept, and contentType. Following the prompt, the foundation model in Amazon Bedrock sumamrizes the text. In this Bedrock service generates the entire summary for the given prompt in a single output, this can be slow if the output contains large amount of tokens.

In [14]:
%%time

body = json.dumps({"inputText": prompt, 
                   "textGenerationConfig":{
                       "maxTokenCount":256,
                       "stopSequences":[],
                       "temperature":0,
                       "topP":1
                   },
                  }) 

modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'

try:
    
    response = br_runtime.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    output_text = response_body.get('results')[0].get('outputText')
    print(len(re.findall(r"[\w']+", output_text)))
    print(output_text)

except botocore.exceptions.ClientError as error:    
    raise error

181
The number of multifamily properties offering concessions has increased in 2023 due to a mix of economic trends, including slowing but still positive job growth, elevated single-family housing prices, and favorable demographics. Rent growth has slowed significantly, and there is a robust pipeline of new apartment rental projects underway in the nation's largest metros. Concessions are incentives with an economic value for renters, such as periods of free rent, free utilities, or other amenities. The value of the concession being offered has slightly declined, from 7.7% of asking rent in August 2022 to 7.2% in August 2023. Supply of new units remains high, with nearly 730,000 units underway with completion dates expected in 2023, though only 164,000 were completed year to date through April.

The number of apartment units completed across the country may reach a record in 2023, with nearly 730,000 units underway with completion dates expected in 2023. New apartment supply has been c

### Streaming LLM Output

In [15]:
%%time

try:
    response = br_runtime.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
    stream = response.get('body')
    output = []
    i = 1
    if stream:
        for event in stream:
            chunk = event.get('chunk')
            if chunk:
                chunk_obj = json.loads(chunk.get('bytes').decode())
                text = chunk_obj['outputText']
                clear_output(wait=True)
                output.append(text)
                display_markdown(Markdown(''.join(output)))
                i+=1

    clear_output(wait=True)
    print(''.join(output))

except botocore.exceptions.ClientError as error:
    raise error

The number of multifamily properties offering concessions has increased in 2023 due to a mix of economic trends, including slowing but still positive job growth, elevated single-family housing prices, and favorable demographics. Rent growth has slowed significantly, and there is a robust pipeline of new apartment rental projects underway in the nation's largest metros. Concessions are incentives with an economic value for renters, such as periods of free rent, free utilities, or other amenities. The value of the concession being offered has slightly declined, from 7.7% of asking rent in August 2022 to 7.2% in August 2023. Supply of new units remains high, with nearly 730,000 units underway with completion dates expected in 2023, though only 164,000 were completed year to date through April.

The number of apartment units completed across the country may reach a record in 2023, with nearly 730,000 units underway with completion dates expected in 2023. New apartment supply has been conce

### Meta - LlaMa 13B Model 
Here is an example of API request for sending text to Meta Llama 2. Inference parameters in textGenerationConfig depends on the model that you are about to use. Inference paramerters of Meta Llama 2 are:

     response = bedrock.invoke_model(body=
                                {"prompt": prompt,
                                 "max_gen_len": 512,
                            	 "temperature": 0.2,
                            	 "top_p": 0.9
                                },
                                modelId="meta.llama2-13b-chat-v1", 
                                accept=accept, 
                                contentType=contentType)


In [16]:
%%time

body = json.dumps({"prompt": prompt,
                 "max_gen_len": 512,
            	 "temperature": 0.2,
            	 "top_p": 0.9
                  }) 

modelId = 'meta.llama2-13b-chat-v1' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'

try:
    response = br_runtime.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read().decode('utf-8'))
    print(response_body.get('generation'))
except botocore.exceptions.ClientError as error:    
    raise error

Here is a summary of the text:

The multifamily housing market is experiencing a softening of fundamentals, including a rise in the number of properties offering concessions, due to mixed economic trends and a robust pipeline of new apartment rental projects. The number of units completed across the country may reach a record in 2023, with the highest levels observed in New York City, Dallas, Houston, and Atlanta. The overall pipeline underway is slightly lower than last year, and supply chains are operating more efficiently. The multifamily concession rate has declined slightly, but the value of concessions remains well above pre-pandemic levels. Class A units have seen the highest level of concessions, while Class B and C units have seen a notable increase in the percentage of units offering concessions. The softening demand for apartments is expected to lead to more concessions and stagnant rent growth.
CPU times: user 4.84 ms, sys: 43 µs, total: 4.89 ms
Wall time: 6.1 s


### Conclusion

You have now experimented with using boto3 SDK which provides a vanilla exposure to Amazon Bedrock API. Using this API you have seen the use case of generating a summary of a PDF file using 2 different foundation models: entire output and streaming output generation.

#### Take aways
- Adapt this notebook to experiment with different models available through Amazon Bedrock such as Amazon Titan and AI21 Labs Jurassic models.
- Change the prompts to your specific usecase and evaluate the output of different models.
- Play with the token length to understand the latency and responsiveness of the service.
- Apply different prompt engineering principles to get better outputs.

### Restart Kernel

In [17]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")