# Long text summarization using LCEL chains on Langchain with Bedrock APIs

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

## Overview
When we work with large documents, we can face some challenges as the input text might not fit into the model context length, or the model hallucinates with large documents, or, out of memory errors, etc.

To solve those problems, we are going to show a solution that is based on the concept of chunking and chaining prompts. This solution is leveraging [LangChain](https://python.langchain.com/docs/get_started/introduction.html) which is a popular framework for developing applications powered by language models.

In this architecture:

1. A large document (or a giant file appending small ones) is loaded
1. Langchain utility is used to split it into multiple smaller chunks (chunking)
1. First chunk is sent to the model; Model returns the corresponding summary
1. Langchain gets next chunk and appends it to the returned summary and sends the combined text as a new request to the model; the process repeats until all chunks are processed
1. In the end, you have final summary based on entire content

### Use case
This approach can be used to summarize call transcripts, meetings transcripts, books, articles, blog posts, and other relevant content.

### Imports

In [1]:
import os
import sys
from langchain_aws import ChatBedrockConverse
from IPython.core.display import HTML
from IPython.display import display_markdown, Markdown
import boto3

HTML("<script>Jupyter.notebook.kernel.restart()</script>")

module_path = ".."
sys.path.append(os.path.abspath(module_path))

boto3_bedrock = boto3.client('bedrock-runtime')

textgen_llm = ChatBedrockConverse(
    model_id="us.amazon.nova-micro-v1:0",
    client=boto3_bedrock,
    max_tokens=None,
    temperature=0.5
)


### Load shareholder letter

We will be following a process similar to lab 02 in this summarization section. First, let us load the 2022 Amazon shareholder letter

In [2]:
shareholder_letter = "./letters/2022-letter.txt"

with open(shareholder_letter, "r") as file:
    letter = file.read()

In [3]:
len(letter.split(' '))

5084

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=8096, chunk_overlap=100
)

docs = text_splitter.create_documents([letter])

In [5]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import XMLOutputParser
from langchain.schema.output_parser import StrOutputParser


xml_parser = XMLOutputParser(tags=['insight'])
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Format help: {format_instructions}.
    Assistant:""",
    input_variables=["instructions","document"],
    partial_variables={"format_instructions": xml_parser.get_format_instructions()},
)

insight_chain = prompt | textgen_llm | StrOutputParser()

In [6]:
print(f"Number of Documents {len(docs)}")

Number of Documents 5


# Option 1. Manually process insights, then summarize

In [7]:
%%time
insights=[]
for i in range(len(docs)):
    insights.append(
        insight_chain.invoke({
        "instructions":"Provide Key insights from the following text",
        "document": {docs[i].page_content}
    }))

CPU times: user 39.6 ms, sys: 8.97 ms, total: 48.5 ms
Wall time: 7.62 s


In [8]:
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Assistant:""",
    input_variables=["instructions","document"]
)

summary_chain = prompt | textgen_llm | StrOutputParser()

In [9]:
%%time
display_markdown(Markdown(summary_chain.invoke({
        "instructions":"You will be provided with multiple sets of insights. Compile and summarize these insights and provide key takeaways in one concise paragraph. Do not use the original xml tags. Just provide a paragraph with your compiled insights.",
        "document": {'\n'.join(insights)}
    })))

Despite macroeconomic challenges and internal operational difficulties in 2022, Amazon demonstrated resilience by growing demand and innovating across its largest businesses, emphasizing long-term investments and strategic adjustments. The company's consumer business saw significant growth, with revenue increasing from $245B in 2019 to $434B in 2022, doubling its fulfillment center footprint and expanding its last-mile transportation network to the size of UPS. Amazon Web Services (AWS) continued to thrive with a $85B revenue run rate and a 29% year-over-year growth rate, despite cautious spending by companies, driven by robust new customer pipeline and over 3,300 new features and services launched in 2022. Amazon's Advertising business also grew rapidly, leveraging machine learning algorithms and planning/measurement solutions to enhance its capabilities. Additionally, Amazon is making significant investments in machine learning and exploring opportunities to integrate advertising into various products. The company's international expansion and growth in its grocery business, including Whole Foods and Amazon Fresh, also highlight its strategic focus on diverse markets and segments. Amazon Business has thrived by leveraging its ecommerce and logistics capabilities to provide procurement solutions for businesses, driving $35B in annualized gross sales. Amazon's leadership remains optimistic about future opportunities, particularly in large language models and generative AI, which are expected to transform customer experiences across all platforms.

CPU times: user 6.47 ms, sys: 0 ns, total: 6.47 ms
Wall time: 1.62 s


# Option 2. Use Map reduce pattern on Langchain

In [10]:
from langchain.chains.summarize import load_summarize_chain
summary_chain = load_summarize_chain(llm=textgen_llm, chain_type="map_reduce", verbose=False, token_max=1024)

In [None]:
%%time
display_markdown(Markdown(summary_chain.invoke(docs)['output_text']))

In his second annual shareholder letter, Amazon CEO Andy Jassy expresses optimism about Amazon's future despite macroeconomic challenges. He highlights Amazon's growth and innovation across sectors like its third-party marketplace, AWS, and new products like Kindle and Alexa. Jassy stresses long-term investments and adaptability, noting Amazon's past success in navigating downturns. He mentions recent decisions to close underperforming businesses and reduce corporate roles by 27,000. Amazon's consumer business surged during the pandemic, doubling its fulfillment centers and expanding its transportation network. AWS continues to grow with an $85B revenue run rate, focusing on long-term customer relationships and launching new features like Graviton3 processors. Amazon's Advertising business is expanding rapidly due to effective machine learning-based advertising. Amazon is also expanding into new markets like grocery, business procurement, healthcare, and broadband internet through initiatives like Amazon Pharmacy, One Medical, and the Kuiper satellite project. Amazon invests heavily in Large Language Models and Generative AI to drive innovation and enhance customer experiences, confident in its future growth driven by relentless innovation and customer focus.

CPU times: user 5.07 s, sys: 23.4 s, total: 28.5 s
Wall time: 3min 19s


# Reference - Read Full Shareholder Letter
Optionally here please run the next cell to view the Shareholder letter. Cross reference with the outputs of options 1 and 2 to gage the effectiveness of the summerization prompts


In [12]:
print(letter)

As I sit down to write my second annual shareholder letter as CEO, I find myself optimistic and energized by what lies ahead for Amazon. Despite 2022 being one of the harder macroeconomic years in recent memory, and with some of our own operating challenges to boot, we still found a way to grow demand (on top of the unprecedented growth we experienced in the first half of the pandemic). We innovated in our largest businesses to meaningfully improve customer experience short and long term. And, we made important adjustments in our investment decisions and the way in which we’ll invent moving forward, while still preserving the long-term investments that we believe can change the future of Amazon for customers, shareholders, and employees.

While there were an unusual number of simultaneous challenges this past year, the reality is that if you operate in large, dynamic, global market segments with many capable and well-funded competitors (the conditions in which Amazon operates all of it