## Summarization

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

### A) Basic Prompt

Useful to summarize a few sentences or paragraphs. It's limited to the model context of around 4,096 tokens. It make only single call to the LLM and the LLM has access to all the data at once

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage

In [3]:
text = '''
Mojo is a new programming language for AI and is said to be 35000X faster than Python. \
With AI (Artificial Intelligence) on the rise, we need appropriate tools to build efficiently. \
Mojo was created by Chris Lattner, the creator of the Swift programming language and the LLVM Compiler Infrastructure. \
Lattner started working on Mojo in 2019, and the language was first released in May 2023. \
Mojo Lang is a programming language designed for AI hardware, such as GPUs with CUDA support. \
It accomplishes this through the use of Multi-Level Intermediate Representation (MLIR) \
to scale hardware varieties without complexity.
'''
message = [
    SystemMessage(content='You are an expert copywriter with expertise in summarising documents'),
    HumanMessage(content=f"Please provide a short and concise summary of the following text:\n TEXT: {text}")
]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [4]:
llm.get_num_tokens(text)

129

In [5]:
summary_output = llm(message)

In [6]:
summary_output.content

'Mojo is a new programming language for AI that is 35000X faster than Python. Created by Chris Lattner, the creator of Swift and LLVM, Mojo is designed for AI hardware and uses Multi-Level Intermediate Representation (MLIR) to scale hardware varieties without complexity.'

### Prompt Templates

This method is restricted to summarising text that together with the summary has a length lower than the model's maximum allowed number of allowed tokens i.e should not exit the token limit of the model

In [7]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [8]:
template = '''
Write a short and conscise summary of text: `{text}`
Translate the summary to {language}
'''

prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [9]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

147

In [10]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.run({'text': text, 'language': 'french'})
summary

"Mojo est un nouveau langage de programmation pour l'IA et est dit être 35000 fois plus rapide que Python. Avec l'IA en plein essor, nous avons besoin d'outils appropriés pour construire efficacement. Mojo a été créé par Chris Lattner, le créateur du langage de programmation Swift et de l'infrastructure du compilateur LLVM. Lattner a commencé à travailler sur Mojo en 2019 et le langage a été publié pour la première fois en mai 2023. Mojo Lang est un langage de programmation conçu pour le matériel d'IA, tel que les GPU avec prise en charge de CUDA. Il réalise cela grâce à l'utilisation de la Représentation Interne à Niveaux Multiples (MLIR) pour mettre à l'échelle les variétés de matériel sans complexité."

### Summarizing using StuffDocumentsChain

In Stuffing method, you stuff all the text to be summarized into the prompt as context to pass to the language model. This is similar to the methods shown previously, regardless of document size. It only makes a single call to the llm and when generating text, the llm has access to all the data at once. This won't work in large documents as this will result in the prompt being larger than the context length. It only works with smaller pieces of data and is not feasible with large amounts of data.

In [11]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [12]:
with open("../files/sj.txt") as f:
    text = f.read()

docs = [Document(page_content=text)]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [13]:
template = '''
Write a short and conscise summary of text: `{text}`
'''

prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

chain = load_summarize_chain(llm, chain_type='stuff', prompt=prompt, verbose=False)
output = chain.run(docs)
output

'The speaker, Steve Jobs, shares three stories from his life during a commencement speech. The first story is about dropping out of college and how it led him to take a calligraphy class that later influenced the design of the Macintosh computer. The second story is about getting fired from Apple, which ultimately led him to start new ventures and find success. The third story is about facing death and how it reminded him of the importance of living life to the fullest. Jobs encourages the graduates to follow their passions, trust their intuition, and stay hungry and foolish in their pursuits.'

### Summarizing Large Documents using MapReduce
Used for large documents that exceed the token limit of the model.
It splits the document into small chunks that fit in the token limit of the model. It then summarises each chunk and then gets a summary of the summary.
Uses an initial prompt to summarise each chunk of data and another one to combine each summary into the final one.
It scales to larger documents and the cost to the llm on individual chunks is independent and can run in parallel. It requires many more costs (API calls) to the llm than stuffDocument. Also, it loses some info during the final combining call 

In [14]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [15]:
with open("../files/sj.txt") as f:
    text = f.read()

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [16]:
llm.get_num_tokens(text)

2653

In [17]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])
len(chunks)

2

In [18]:
chain = load_summarize_chain(llm, chain_type='map_reduce', verbose=False)
output_summary = chain.run(chunks)
output_summary

"Steve Jobs shares three stories from his life, including dropping out of college and how it influenced the design of the Macintosh computer, getting fired from Apple and finding success in new ventures, and his experience with cancer and the importance of living each day fully. He encourages the audience to follow their passions, not settle, and remember that life is short. The speaker also reflects on the inevitability of death, urges the audience to live their own lives, and mentions The Whole Earth Catalog's message of staying hungry and foolish."

In [19]:
# prompt for summarizing each chunk
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [20]:
# prompt for combining the summaries
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### MapReduce with custom prompts

The defualt prompts in previous mapreduce are static

In [21]:
# First prompt used to summarise each chunk is called map_prompt
map_prompt = '''
Write a short and concise summary of the following:
Text: ``{text}
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [22]:
# Second prompt that summarises the summaries
combine_prompt='''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED 
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=combine_prompt
)

In [23]:
chain = load_summarize_chain(
    llm=llm,
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    chain_type='map_reduce',
    verbose=False
)
output = chain.run(chunks)
print(output)

Title: Steve Jobs' Speech on Life Lessons and Following Your Passion

Introduction:
In this speech, Steve Jobs shares three stories from his life, highlighting the importance of following one's passion and living life to the fullest. He discusses how dropping out of college, getting fired from Apple, and his experience with cancer shaped his perspective on life.

Key Points:
- Story 1: Jobs dropped out of college and took a calligraphy class, which later influenced the design of the Macintosh computer.
- Story 2: Getting fired from Apple led Jobs to start new ventures and ultimately find success.
- Story 3: Jobs' experience with cancer reminded him of the importance of living each day to the fullest.

- Jobs emphasizes the significance of following one's passion and not settling for less.
- He encourages readers to live their own lives and not be influenced by others' opinions.
- The author shares a personal anecdote about The Whole Earth Catalog and the message of "Stay Hungry. Stay F

### Using CombineDocumentChain

Uses a refined chain. The document is split as well
It summarises the first chunk of data, and the summary of the first chunk is passed into the second chunk.
The LLM then refines the summary and passes it to the next chunk.
This is done until the nth chuck
- summarise(chunk #1) =< summary 1
- summarise(summary #1 + chunk #2) =< summary 2
- summarise(summary #2 + chunk #3) =< summary 3
- ...
- summarise(summary #n-1 + chunk #n) =< final summary

It is less lossy than MapReduce and uses a more relevant context, resulting in better summarisation. However, it requires more calls to the LLM. These calls are not independent and cannot be parallelized

In [31]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [32]:
# pip install unstructured -q

In [33]:
# pip install pdf2image -q

In [34]:
loader = UnstructuredPDFLoader('../files/attention_is_all_you_need.pdf')
data = loader.load()

In [37]:
# print(data[0].page_content)

In [40]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [41]:
len(chunks)

5

In [42]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [44]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens/1000 * 0.002:.6f}')

print_embedding_cost(chunks)

Total Tokens: 9861
Embedding Cost in USD: 0.019722


In [45]:
chain = load_summarize_chain(llm, chain_type='refine', verbose=False)
output_summary = chain.run(chunks)
output_summary

'The paper introduces the Transformer, a new network architecture based solely on attention mechanisms. The Transformer model achieves superior results in machine translation tasks and offers advantages such as parallelizability and reduced training time compared to existing models. The paper discusses the benefits of self-attention and describes the architecture of the Transformer model, which consists of stacked self-attention and fully connected layers in both the encoder and decoder. The attention mechanism used in the model is called Scaled Dot-Product Attention. Additionally, the paper explores the use of multi-head attention and positional encodings in the Transformer model. The authors compare self-attention layers to recurrent and convolutional layers in terms of computational complexity, parallelizability, and the ability to learn long-range dependencies. The paper also presents the training regime for the models, including the training data, hardware, optimizer, and regulari

### Summariizing using Refine chain and custom prompt

In [57]:
prompt_template= '''
Write a short and concise summary of the following extracting the key information:
Text: ``{text}
CONCISE SUMMARY:
'''

initial_prompt = PromptTemplate(
    input_variables=['text'],
    template=prompt_template
)

refine_template = '''
Produce a final summary.
I have provided an existing summary up to a certain point: {existing_answer}. 
Please refine the existing summary with some more context below
*******
{text}
*******
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED 
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
'''
refine_prompt = PromptTemplate(
    input_variables=['existing_answer', 'text'],
    template=refine_template
)
chain = load_summarize_chain(
    llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
)
output_summary = chain.run(chunks)
output_summary

"Title: The Transformer: A New Network Architecture Based on Attention Mechanisms\n\nIntroduction:\nThe Transformer is a novel network architecture that relies solely on attention mechanisms, eliminating the need for recurrent or convolutional neural networks. It has shown remarkable performance in machine translation tasks, offering improved quality, parallelizability, and reduced training time. The Transformer's versatility extends to other tasks, enabling it to compute input and output representations without relying on sequence-aligned RNNs or convolution.\n\nSummary:\n\n- The attention mechanism used in the Transformer is called Scaled Dot-Product Attention, which computes the dot products of queries and keys and applies a softmax function to obtain weights.\n- The Transformer model incorporates multi-head attention, allowing it to jointly attend to information from different representation subspaces at different positions.\n- The model architecture consists of stacked self-attent

### Summarizing using LangChain Agents

In [64]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [65]:
# pip install wikipedia -q

In [66]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [67]:
tools = [
    Tool(
        name='wikipedia',
        func=wikipedia.run,
        description="Useful for getting information about a topic from Wikipedia"
    )
]

In [75]:
agent_executor = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=False)
output = agent_executor.run("Give a summary about King Jaja of Opobo Kingdom")

In [76]:
output

'King Jaja of Opobo Kingdom was a prominent leader in the Opobo Kingdom, which is located in Rivers State, Nigeria. The Opobo Kingdom is divided into 14 sections and King Jaja was one of the leaders of the Jaja section.'