# Summarization Using LangChain and OpenAI

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

### A) Basic Prompt

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)


In [3]:
text= """
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability \
of AI hardware and extensibility of AI models.
Mojo is a new programming language that bridges the gap between research and production \
by combining the best of Python syntax with systems programming and metaprogramming.
With Mojo, you can write portable code that’s faster than C and seamlessly inter-op with the Python ecosystem.
When we started Modular, we had no intention of building a new programming language. \
But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, \
we realized that programming across the entire stack was too complicated. Plus, we were writing a \
lot of MLIR by hand and not having a good time.
And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" \
is the host CPU. Nowadays, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration \
units, but they also serve as the “fallback” for operations that specialized accelerators don’t handle, \
such as data loading, pre- and post-processing, and integrations with foreign systems. \
"""

messages = [
    SystemMessage(content='You are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')



In [4]:
llm.get_num_tokens(text)

229

In [5]:
summary_output = llm(messages)

In [6]:
print(summary_output.content)

Mojo is a new programming language that combines the usability of Python with the performance of C. It aims to bridge the gap between research and production in the field of AI by offering portable code that is faster than C and seamlessly integrates with the Python ecosystem. Mojo was developed to simplify programming across the entire ML/AI infrastructure and to address the complexity of writing MLIR by hand. Additionally, Mojo recognizes the importance of host CPUs as accelerators and their role in handling operations that specialized accelerators cannot.


### Summarizing Using Prompt Templates

In [7]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [8]:
template = '''
Write a concise and short summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
'''
prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [9]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

249

In [10]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.run({'text': text, 'language':'hindi'})

In [11]:
print(summary)

Mojo एक नया प्रोग्रामिंग भाषा है जो पायथन की उपयोगिता को सी की प्रदर्शन के साथ मिलाकर एआई हार्डवेयर की अद्वितीय प्रोग्रामबिलिटी और एआई मॉडल की विस्तारयोग्यता को खोलता है। Mojo एक नई प्रोग्रामिंग भाषा है जो पायथन की सिंटेक्स को सिस्टम प्रोग्रामिंग और मेटाप्रोग्रामिंग के साथ मिलाकर अनुसंधान और उत्पादन के बीच की खाई को पूरा करती है। Mojo के साथ, आप C से तेज़ पोर्टेबल कोड लिख सकते हैं और पायथन इकोसिस्टम के साथ सहजता से इंटरऑप कर सकते हैं।


### Summarizing using SuffDocumentChain

In [12]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document


In [15]:
with open('../files/sj.txt', encoding='utf-8') as f:
    text = f.read()

# text

docs = [Document(page_content=text)]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [16]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [17]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.run(docs)

In [18]:
print(output_summary)

The speaker, Steve Jobs, shares three stories from his life during a commencement speech. The first story is about dropping out of college and how it led him to take a calligraphy class, which later influenced the design of the Macintosh computer. The second story is about getting fired from Apple and how it allowed him to start over and eventually create successful companies like NeXT and Pixar. The third story is about facing death when he was diagnosed with cancer and how it made him realize the importance of following his heart and not wasting time. He ends the speech by encouraging the graduates to stay hungry and stay foolish.


### Summarizing Large Documents Using map_reduce

In [19]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [21]:
with open('../files/sj.txt', encoding='utf-8') as f:
    text = f.read()

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [22]:
llm.get_num_tokens(text)

2653

In [23]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [24]:
len(chunks)

2

In [25]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
output_summary = chain.run(chunks)

In [26]:
print(output_summary)

Steve Jobs shares three stories from his life, including dropping out of college and how it influenced the design of the Macintosh computer, getting fired from Apple and finding success in new ventures, and his experience with cancer and the importance of living each day fully. He emphasizes the importance of following one's passion, not settling, and embracing the inevitability of death. The speaker encourages the audience to live their own lives, follow their hearts, and embrace qualities of staying hungry and foolish as they embark on their new journey after graduation.


In [27]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [28]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### map_reduce wich Custom Prompts

In [29]:
map_prompt = '''
Write a short and concise summary of the following:
Text: `{text}`
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [30]:
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=['text'])

In [31]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.run(chunks)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht

In [32]:
print(output)

Title: Steve Jobs' Speech on Life Lessons and Following Your Passion

Introduction:
In this speech, Steve Jobs shares three stories from his life, highlighting the importance of following one's passion and not settling for anything less. He discusses how his experiences with dropping out of college, getting fired from Apple, and battling cancer shaped his perspective on life.

Key Points:
- Story 1: Jobs dropped out of college and took a calligraphy class, which later influenced the design of the Macintosh computer.
- Story 2: Getting fired from Apple led Jobs to start new ventures and find success.
- Story 3: Jobs' experience with cancer reminded him of the importance of living each day to the fullest.
- Jobs encourages the audience to follow their passions and not be influenced by others' opinions.
- The text discusses the inevitability of death and emphasizes the importance of living one's own life.
- Following one's heart and intuition is emphasized.
- The text mentions The Whole E

### Summarizing Using the refine Chain

In [1]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [3]:
loader = UnstructuredPDFLoader('../files/attention_is_all_you_need.pdf')
data = loader.load()

[nltk_data] Downloading package punkt to /home/paul/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/paul/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [4]:
# print(data[0].page_content)

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [6]:
len(chunks)

5

In [7]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [8]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('gpt-3.5-turbo')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.002:.6f}')


print_embedding_cost(chunks)

Total Tokens: 10041
Embedding Cost in USD: 0.020082


In [9]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.run(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Attention Is All You Need

7 1 0 2 c e D 6

Ashish Vaswani∗ Google Brain avaswani@google.com

Llion Jones∗ Google Research llion@google.com

Noam Shazeer∗ Google Brain noam@google.com

Niki Parmar∗ Google Research nikip@google.com

Jakob Uszkoreit∗ Google Research usz@google.com

Aidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu

Łukasz Kaiser∗ Google Brain lukaszkaiser@google.com

Illia Polosukhin∗ ‡ illia.polosukhin@gmail.com

] L C . s c [

5 v 2 6 7 3 0 . 6 0 7 1 : v i X r a

1

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary.
We have provided an existing summary up to a certain point: The paper introduces a new network architecture called the Transformer, which is based solely on attention mechanisms and does not use recurrent or convolutional neural networks. The Transformer model achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used.
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
Attention(Q, K, V ) = softmax(

QK T √ dk

)V

(1)

The two most commonly used attention functions are additive attention [2], and dot-product (multi- 

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..



[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary.
We have provided an existing summary up to a certain point: The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks compared to existing models. The Transformer is more parallelizable and requires less training time. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. The paper further compares self-attention layers to recurrent and convolutional layers in terms of computational complexity, parallelizability, and the length of paths between long-range dependencies in the network. Additionally, the paper explores the interpretability of self-attention and presents examples of attention distributions related to

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht


[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [10]:
print(output_summary)

The paper introduces the Transformer, a network architecture based solely on attention mechanisms, which achieves superior results in machine translation tasks compared to existing models. The Transformer is more parallelizable and requires less training time. The paper also discusses the advantages of self-attention and describes the architecture of the Transformer, including the encoder and decoder stacks and the attention mechanism used. The paper further compares self-attention layers to recurrent and convolutional layers in terms of computational complexity, parallelizability, and the length of paths between long-range dependencies in the network. Additionally, the paper explores the interpretability of self-attention and presents examples of attention distributions related to the syntactic and semantic structure of sentences. The training regime, hardware, schedule, optimizer, and regularization techniques used for training the models are described. The paper also presents the re

### refine With Custom Prompts

In [11]:
prompt_template = """Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:"""
initial_prompt = PromptTemplate(template=prompt_template, input_variables=['text'])

refine_template = '''
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    ------------
    {text}
    ------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.

'''
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)


In [12]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False

)
output_summary = chain.run(chunks)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht

In [13]:
print(output_summary)

The paper introduces the Transformer, a new network architecture that relies solely on attention mechanisms for sequence transduction tasks. The Transformer model achieves superior results in machine translation tasks, is more parallelizable, and requires less training time compared to existing models. It achieves state-of-the-art results in translation quality and generalizes well to other tasks. The attention mechanism in the Transformer is based on dot-product attention, which is faster and more space-efficient compared to additive attention. The model also employs multi-head attention, allowing it to jointly attend to information from different representation subspaces at different positions. The Transformer uses self-attention layers in the encoder and decoder, as well as encoder-decoder attention layers. The model also includes position-wise feed-forward networks and positional encodings to capture the order of the sequence. The Transformer architecture offers a more efficient an

### Summarizing Using LangChain Agents

In [14]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [15]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [16]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [17]:
tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    )
]

In [18]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

In [19]:
output = agent_executor.run('Can you please provide a short summary of George Washington?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use Wikipedia to find a short summary of George Washington.
Action: Wikipedia
Action Input: George Washington[0m

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..



Observation: [36;1m[1;3mPage: George Washington
Summary: George Washington (February 22, 1732 – December 14, 1799) was an American military officer, statesman, and Founding Father who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in June 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the American federal government. Washington has thus been called the "Father of his Country".
Washington's first public office, from 1749 to 1750, was as surveyor of Culpeper County in the Colony of Virginia. He subsequently received military training and was assigned command of the Virginia Regiment during the French and Indian War. He was later elected to the Virginia House of Burgesses and was named a delegate to 

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-NgmXnpBpFaLLIbFlmm27vgpu on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht

[32;1m[1;3mI now know the final answer.
Final Answer: George Washington (February 22, 1732 – December 14, 1799) was an American military officer, statesman, and Founding Father who served as the first president of the United States from 1789 to 1797. He played a crucial role in the American Revolutionary War, led the Constitutional Convention, and implemented a strong national government as president. He is considered one of the greatest U.S. presidents. However, his legacy is also marred by his ownership of slaves and his complicated relationship with slavery.[0m

[1m> Finished chain.[0m


In [20]:
print(output)

George Washington (February 22, 1732 – December 14, 1799) was an American military officer, statesman, and Founding Father who served as the first president of the United States from 1789 to 1797. He played a crucial role in the American Revolutionary War, led the Constitutional Convention, and implemented a strong national government as president. He is considered one of the greatest U.S. presidents. However, his legacy is also marred by his ownership of slaves and his complicated relationship with slavery.
