# Project: Summarization App Using LangChain and OpenAI
This is part of my **"Learn LangChain, Pinecone & OpenAI: Build Next-Gen LLM Apps"** course.

https://www.udemy.com/course/master-langchain-pinecone-openai-build-llm-applications/?referralCode=4B17E3BD4CBBEA3B8321

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

### A) Basic Prompt

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)


In [None]:
text= """
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability \
of AI hardware and extensibility of AI models.
Mojo is a new programming language that bridges the gap between research and production \ 
by combining the best of Python syntax with systems programming and metaprogramming.
With Mojo, you can write portable code that’s faster than C and seamlessly inter-op with the Python ecosystem.
When we started Modular, we had no intention of building a new programming language. \
But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, \
we realized that programming across the entire stack was too complicated. Plus, we were writing a \
lot of MLIR by hand and not having a good time.
And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" \
is the host CPU. Nowadays, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration \
units, but they also serve as the “fallback” for operations that specialized accelerators don’t handle, \
such as data loading, pre- and post-processing, and integrations with foreign systems. \
"""

messages = [
    SystemMessage(content='You are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')



In [None]:
llm.get_num_tokens(text)

In [None]:
summary_output = llm(messages)

In [None]:
print(summary_output.content)

### Summarizing Using Prompt Templates

In [None]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [None]:
template = '''
Write a concise and short summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
'''
prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [None]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

In [None]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.run({'text': text, 'language':'hindi'})

In [None]:
print(summary)

### Summarizing using SuffDocumentChain

In [None]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document


In [None]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

# text

docs = [Document(page_content=text)]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [None]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.run(docs)

In [None]:
print(output_summary)

### Summarizing Large Documents Using map_reduce

In [None]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
llm.get_num_tokens(text)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [None]:
len(chunks)

In [None]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
output_summary = chain.run(chunks)

In [None]:
print(output_summary)

In [None]:
chain.llm_chain.prompt.template

In [None]:
chain.combine_document_chain.llm_chain.prompt.template

### map_reduce wich Custom Prompts

In [None]:
map_prompt = '''
Write a short and concise summary of the following:
Text: `{text}`
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [None]:
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=['text'])

In [None]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.run(chunks)

In [None]:
print(output)

### Summarizing Using the refine Chain

In [1]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [3]:
#!pip install unstructured -q

In [11]:
#!pip install pdfminer
#!pip install pdf2image
#!pip install opencv-python
#!pip install unstructured-inference
#!pip install pikepdf

In [12]:
loader = UnstructuredPDFLoader('files/attention-is-all-you-need.pdf')
data = loader.load()

In [14]:
#print(data[0].page_content)

In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [16]:
len(chunks)

4

In [17]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [18]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('gpt-3.5-turbo')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.002:.6f}')
    
    
print_embedding_cost(chunks)

Total Tokens: 8035
Embedding Cost in USD: 0.016070


In [19]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.run(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Attention Is All You Need

Ashish Vaswani∗ Google Brain avaswani@google.com

Noam Shazeer∗ Google Brain noam@google.com

Niki Parmar∗ Google Research nikip@google.com

Jakob Uszkoreit∗ Google Research usz@google.com

Llion Jones∗ Google Research llion@google.com

Aidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu

Łukasz Kaiser∗ Google Brain lukaszkaiser@google.com

Illia Polosukhin∗ ‡ illia.polosukhin@gmail.com

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convoluti

In [20]:
print(output_summary)

The paper proposes a new network architecture called the Transformer, which is based solely on attention mechanisms and does not use recurrent or convolutional neural networks. The Transformer model achieves superior results in machine translation tasks while being more parallelizable and requiring less training time compared to existing models. The paper describes the architecture of the Transformer, including the encoder and decoder stacks, and explains the attention mechanism used in the model. The paper also discusses the use of multi-head attention and positional encodings in the Transformer model. The authors compare self-attention layers to recurrent and convolutional layers in terms of computational complexity, parallelizability, and the ability to learn long-range dependencies. They find that self-attention layers have lower computational complexity and can learn long-range dependencies more easily. Additionally, the paper provides details on the training regime, hardware, sch

### refine With Custom Prompts

In [21]:
prompt_template = """Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:"""
initial_prompt = PromptTemplate(template=prompt_template, input_variables=['text'])

refine_template = '''
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    ------------
    {text}
    ------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
    
'''
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)


In [22]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
    
)
output_summary = chain.run(chunks)

In [23]:
print(output_summary)

Introduction:
The paper introduces the Transformer, a new network architecture that relies solely on attention mechanisms and eliminates the need for recurrent or convolutional neural networks. The Transformer model achieves superior results in machine translation tasks, offers better parallelization, and requires less training time compared to existing models. The paper also discusses the advantages of self-attention and provides details on the architecture of the encoder and decoder stacks in the Transformer model.

Key Points:
- The attention mechanism used in the Transformer model is called Scaled Dot-Product Attention, which computes the compatibility function between queries, keys, and values.
- The paper compares two commonly used attention functions: additive attention and dot-product attention. Dot-product attention is faster and more space-efficient due to optimized matrix multiplication code.
- Multi-Head Attention is employed in the Transformer model, allowing it to jointly

### Summarizing Using LangChain Agents

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

In [None]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [None]:
tools = [
    Tool(
        name="Wikipedia", 
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    )
]

In [None]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

In [None]:
output = agent_executor.run('Can you please provide a short summary of George Washington?')

In [None]:
print(output)