# Introduction to Automation with LangChain, Generative AI, and Python
**2.3: Text Summarization**
* Instructor: [Jeff Heaton](https://youtube.com/@HeatonResearch), WUSTL Center for Analytics and Business Insight (CABI), [Washington University in St. Louis](https://olin.wustl.edu/faculty-and-research/research-centers/center-for-analytics-and-business-insight/index.php)
* For more information visit the [class website](https://github.com/jeffheaton/cabi_genai_automation).

Large Language Models (LLMs) like GPT-4 can be utilized to summarize text by extracting key information and presenting it in a concise format. They work by understanding the context and semantic relationships within the original text and then generating a shorter version that retains the essential messages. This process involves natural language understanding and generation capabilities, allowing LLMs to interpret various types of texts, from technical articles to narratives, and produce summaries that are coherent and relevant. The ability to customize the length and focus of the summary based on user preferences makes LLMs particularly effective for digesting large volumes of information quickly and efficiently.

## Summarize Single PDF

We will begin by seeing how to summarize a single PDF. LangChang loads document types, such as PDFs, using a document loader. There are document loaders for various data types. The following code summarizes a single PDF using a generic summarization system prompt.

In [1]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain import OpenAI, PromptTemplate
from langchain_aws import ChatBedrock
from IPython.display import display_markdown

#MODEL = 'mistral.mistral-7b-instruct-v0:2'
#MODEL = 'meta.llama2-70b-chat-v1'
MODEL = 'amazon.titan-text-lite-v1'

# Initialize bedrock, use built in role
llm = ChatBedrock(
    model_id=MODEL,
    model_kwargs={"temperature": 0.2},
)



The following code snippet demonstrates how to use a specific 'load_summarize_chain' function to set up a summarization process using a Large Language Model (LLM) with a "map_reduce" chain type. It starts by loading a PDF from the given URL ("https://arxiv.org/pdf/1706.03762") using the 'PyPDFLoader'. The loaded document is then split into manageable parts ('load_and_split'). These parts are fed into the summarization chain ('chain.run(docs)'), which processes and condenses the content. Finally, the summarized content is displayed in markdown format directly within the output environment, ensuring that the formatting of the summary remains intact.

In [2]:
chain = load_summarize_chain(llm, chain_type="map_reduce")

url = "https://arxiv.org/pdf/1706.03762"
loader = PyPDFLoader(url)
docs = loader.load_and_split()
summary = chain.invoke(docs)['output_text']
display_markdown(summary, raw=True)

Token indices sequence length is longer than the specified maximum sequence length for this model (1417 > 1024). Running this sequence through the model will result in indexing errors


 The Transformer model is a neural network architecture that can generate coherent text from a source text. It has a stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, and a residual connection and layer normalization. The model is capable of generating text in different languages and can handle long input sequences. The paper explores the use of self-attention layers in sequence transduction models, which are used to map one variable-length sequence of symbol representations to another sequence of equal length. The authors find that self-attention layers achieve comparable or better performance on various tasks, while requiring significantly fewer sequential

## Summarize with Custom Prompt

LangChain also allows the use of custom system prompts to tailor text summarization according to specific requirements, such as summarizing content in a different language. This flexibility is showcased in the provided code, where a custom prompt template instructs the system to write a concise summary in Spanish. The template is set up to include a placeholder for the text that needs summarizing, followed by an instruction in Spanish to produce a summary. This custom prompt is then incorporated into the summarization process by configuring both the 'map_prompt' and 'combine_prompt' parameters of the 'load_summarize_chain' function. The process begins by downloading a PDF from a specified URL using 'PyPDFLoader', splitting the document into sections, and then applying the summarization chain with the custom prompt to generate a summarized output in Spanish. The summarized content is then displayed in markdown format to maintain proper formatting. This example illustrates the adaptability of LangChain in handling complex summarization tasks that include language-specific instructions.

In [3]:
TEMPLATE = """
Write a concise summary of the information presented. Write the summary in Spanish.

{text}

SUMMARY:"""
PROMPT = PromptTemplate(template=TEMPLATE, input_variables=["text"])

chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=PROMPT, combine_prompt=PROMPT)

url = "https://arxiv.org/pdf/1706.03762"
loader = PyPDFLoader(url)
docs = loader.load_and_split()
summary = chain.invoke(docs)['output_text']
display_markdown(summary, raw=True)


The Transformer model is a neural network architecture that can learn to sequence text. It has several layers of neural networks, each with a different purpose. The first layer is a position-wise feed-forward network, which takes the input and maps it to a fixed size. The second layer is a multi-head attention layer, which attaches different parts of the input to different parts of the output. The third layer is a feed-forward network with a softmax activation function, which takes the output of the attention layer and maps it to the output size. The fourth layer is a position-wise feed-forward network, which takes

## Summarize Multiple PDFs

We will now see how to summarize multiple documents into one. We will summarize the following four papers, each of which is very important to the field of GenAI.

* "[Attention Is All You Need](https://arxiv.org/pdf/1706.03762)" by Ashish Vaswani et al. (2017) - This paper introduced the Transformer architecture, which has become the backbone of most modern natural language processing systems, including text-to-text generative models like GPT and BERT.
* "[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805)" by Jacob Devlin et al. (2018) - BERT (Bidirectional Encoder Representations from Transformers) revolutionized the way contextual information is handled by using a bidirectional training of Transformer models. This methodology significantly improved the performance of models on various NLP tasks.
* "[Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165)" by Tom B. Brown et al. (2020) - Also known as the GPT-3 paper, it explores the capabilities of very large transformer-based models, demonstrating that scaling up the size of the models improves performance across a broad spectrum of NLP tasks, often requiring little to no task-specific data.
* "[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683)" by Colin Raffel et al. (2019) - This paper introduces T5 (Text-to-Text Transfer Transformer), which converts all NLP tasks into a unified text-to-text format, simplifying the application of transfer learning across different tasks.

We use the same process demonstrated to load all these PDF documents and concatenate their summaries into an array.

In [4]:
urls = [
  "https://arxiv.org/pdf/1706.03762",
  "https://arxiv.org/pdf/1810.04805",
  "https://arxiv.org/pdf/2005.14165",
  "https://arxiv.org/pdf/1910.10683"
]

summaries = []

chain = load_summarize_chain(llm, chain_type="map_reduce")

for url in urls:
  print(f"Reading: {url}")
  loader = PyPDFLoader(url)
  docs = loader.load_and_split()
  chain = load_summarize_chain(llm, chain_type="map_reduce")
  summary = chain.invoke(docs)['output_text']
  summaries.append(summary)

Reading: https://arxiv.org/pdf/1706.03762
Reading: https://arxiv.org/pdf/1810.04805
Reading: https://arxiv.org/pdf/2005.14165
Reading: https://arxiv.org/pdf/1910.10683


After obtaining individual summaries of articles, the next step involves combining these summaries into a single, comprehensive overview. The provided code accomplishes this by first merging all the initial summaries into one long string. To manage the potentially large amount of text, it uses the CharacterTextSplitter class from LangChain to split this combined text into manageable chunks. Each chunk maintains a size of 500 characters with an overlap of 100 characters to ensure continuity and context are preserved across chunks. These chunks are then converted into Document objects, each holding a segment of the summarized text. A new summarization chain is loaded using the same 'map_reduce' model to process these document objects. This chain effectively runs across the segmented texts, extracting key information and producing a final, condensed summary of the combined initial summaries. Finally, this ultimate summary is displayed in markdown format to maintain clarity and formatting, providing a clear and succinct synthesis of the original articles' content.

In [5]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.schema.document import Document

chain = load_summarize_chain(llm, chain_type="map_reduce")

summary_str = " ".join(summaries)
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_text(summary_str)
docs = [Document(page_content=t) for t in texts]
chain = load_summarize_chain(llm, chain_type="map_reduce")
final_summary = chain.invoke(docs)['output_text']
display_markdown(final_summary, raw=True)

 The Transformer model is a neural network architecture introduced in 2017 by Geoffrey Hinton and his colleagues at Google Brain. It has four main components: a self-attention mechanism, a feed-forward network, a position-wise feed-forward network, and an output layer. The self-attention mechanism helps the model learn to attend to relevant parts of the input, while the feed-forward network is used to compute the weighted sum of all the previous hidden states in the network.