<a href="https://colab.research.google.com/github/olonok69/LLM_Notebooks/blob/main/langchain/use_cases/Langchain_OpenAI_Use_cases_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain

LangChain is a framework for developing applications powered by language models.

https://python.langchain.com/docs/use_cases

## Langchain Summarization

https://python.langchain.com/docs/get_started/introduction

https://python.langchain.com/docs/use_cases/summarization

https://python.langchain.com/docs/modules/model_io/prompts/

https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter

In [50]:
!pip install langchain langchain-community tiktoken -q
!pip install -U accelerate -q
! pip install -U unstructured numpy -q
! pip install openai chroma -q

In [51]:

from google.colab import output
output.enable_custom_widget_manager()

In [52]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [53]:
from google.colab import userdata
openai_api_key = userdata.get('KEY_OPENAI')

In [54]:
from langchain_community.llms import OpenAI
from langchain import PromptTemplate
import pprint


llm = OpenAI(temperature=0, model_name='gpt-3.5-turbo-instruct', openai_api_key=openai_api_key)

# Create our template
template = """
%INSTRUCTIONS:
Summarize the following text and format output in lines of no more of 30 words. after each line introduce a return line.
Respond in manner that a non-specialist on the matter would undestand.

%TEXT:
{text}
"""

# Create LangChain prompt template
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

In [55]:
sample1= """
The head of the Allied Pilots Association, the pilots union for American Airlines, insists he would never board an aircraft if it were not safe.

But he says he can no longer take the quality of the plane he's flying for granted.

"I'm at an alert status that I've never had to be in on a Boeing airplane," he says.

"Because I don't trust that they've followed the processes that have previously kept me safe on Boeing airplanes for over three decades."

Executives at the aerospace giant's shiny new headquarters in Arlington, Virginia could be forgiven for feeling like they are under siege.

Every day seems to bring more bad headlines for the company, which is coming under pressure from regulators and airlines, and has seen its reputation badly damaged.

The trouble began in January, when a disused emergency exit door blew off a brand new Boeing 737 Max shortly after take-off from Portland International Airport.

An initial report from the US National Transportation Safety Board concluded that four bolts meant to attach the door securely to the aircraft had not been fitted.
"""

In [56]:
final_prompt = prompt.format(text=sample1)

print(final_prompt)



%INSTRUCTIONS:
Summarize the following text and format output in lines of no more of 30 words. after each line introduce a return line.
Respond in manner that a non-specialist on the matter would undestand.

%TEXT:

The head of the Allied Pilots Association, the pilots union for American Airlines, insists he would never board an aircraft if it were not safe.

But he says he can no longer take the quality of the plane he's flying for granted.

"I'm at an alert status that I've never had to be in on a Boeing airplane," he says.

"Because I don't trust that they've followed the processes that have previously kept me safe on Boeing airplanes for over three decades."

Executives at the aerospace giant's shiny new headquarters in Arlington, Virginia could be forgiven for feeling like they are under siege.

Every day seems to bring more bad headlines for the company, which is coming under pressure from regulators and airlines, and has seen its reputation badly damaged.

The trouble began in 

In [57]:
output = llm.invoke(final_prompt)
pprint.pprint(output)

('\n'
 'The head of the pilots union for American Airlines is concerned about the '
 'safety of Boeing planes. He no longer trusts their processes and executives '
 'are under pressure due to recent incidents.')


# Summarize a book

In [58]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [59]:
text_path = "/content/drive/MyDrive/data (1)/book.txt"

with open(text_path, 'r') as file:
    text = file.read()

# PPrinting the first 1000 characters
pprint.pprint(text[:1000])

('Project Gutenberg eBook of The Great Gatsby\n'
 '    \n'
 'This ebook is for the use of anyone anywhere in the United States and\n'
 'most other parts of the world at no cost and with almost no restrictions\n'
 'whatsoever. You may copy it, give it away or re-use it under the terms\n'
 'of the Project Gutenberg License included with this ebook or online\n'
 'at www.gutenberg.org. If you are not located in the United States,\n'
 'you will have to check the laws of the country where you are located\n'
 'before using this eBook.\n'
 '\n'
 'Title: The Great Gatsby\n'
 '\n'
 '\n'
 'Author: F. Scott Fitzgerald\n'
 '\n'
 'Release date: January 17, 2021 [eBook #64317]\n'
 '                Most recently updated: February 2, 2024\n'
 '\n'
 'Language: English\n'
 '\n'
 'Credits: Produced by Alex Cabal for the Standard Ebooks project, based on a '
 'transcription produced for Project Gutenberg Australia.\n'
 '\n'
 '\n'
 '*** START OF THE PROJECT GUTENBERG EBOOK THE GREAT GATSBY ***\n'
 '\n'
 '\n

In [60]:
num_tokens = llm.get_num_tokens(text)

print (f"There are {num_tokens} tokens in your file")

There are 69499 tokens in your file


In [63]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=300)
docs = text_splitter.create_documents([text])

print (f"You now have {len(docs)} docs intead of 1 piece of text")

You now have 63 docs intead of 1 piece of text


In [64]:
docs[0].page_content

'Project Gutenberg eBook of The Great Gatsby\n    \nThis ebook is for the use of anyone anywhere in the United States and\nmost other parts of the world at no cost and with almost no restrictions\nwhatsoever. You may copy it, give it away or re-use it under the terms\nof the Project Gutenberg License included with this ebook or online\nat www.gutenberg.org. If you are not located in the United States,\nyou will have to check the laws of the country where you are located\nbefore using this eBook.\n\nTitle: The Great Gatsby\n\n\nAuthor: F. Scott Fitzgerald\n\nRelease date: January 17, 2021 [eBook #64317]\n                Most recently updated: February 2, 2024\n\nLanguage: English\n\nCredits: Produced by Alex Cabal for the Standard Ebooks project, based on a transcription produced for Project Gutenberg Australia.\n\n\n*** START OF THE PROJECT GUTENBERG EBOOK THE GREAT GATSBY ***\n\n\n\n\n                           The Great Gatsby\n                                  by\n                  

In [68]:
len(docs[0].page_content)

4746

In [69]:
docs[0].metadata

{}

In [70]:
# Get your chain ready to use
chain = load_summarize_chain(llm=llm, chain_type='map_reduce')#, verbose=True )

In [71]:
output = chain.run(docs)
pprint.pprint(output)

('\n'
 '\n'
 '"The Great Gatsby" by F. Scott Fitzgerald is a free ebook available through '
 "Project Gutenberg. It tells the story of Gatsby's love for Daisy and his "
 'mysterious past, as observed by the narrator Nick. The passage also '
 "discusses Project Gutenberg's mission and terms for using and distributing "
 'free electronic works. ')


In [72]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=300)
docs = text_splitter.create_documents([sample1])

print (f"You now have {len(docs)} docs intead of 1 piece of text")

You now have 1 docs intead of 1 piece of text


In [73]:
# Get your chain ready to use
chain1 = load_summarize_chain(llm=llm, chain_type='stuff')#, verbose=True )

In [74]:
output1 = chain1.run(docs)
pprint.pprint(output1)

(' The head of the pilots union for American Airlines expresses concern about '
 'the safety of Boeing airplanes and states that he no longer trusts the '
 "company's processes. Boeing is facing backlash and damage to its reputation "
 'after a series of incidents, including an emergency exit door blowing off a '
 'new 737 Max due to missing bolts.')


In [75]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=15000, chunk_overlap=300)
docs = text_splitter.create_documents([text])

print (f"You now have {len(docs)} docs intead of 1 piece of text")

You now have 20 docs intead of 1 piece of text


In [76]:
output = chain.run(docs)
pprint.pprint(output)

(" This ebook is a free version of F. Scott Fitzgerald's novel, The Great "
 'Gatsby, available for use in most parts of the world under the Project '
 'Gutenberg License. It follows the story of a man named Gatsby and his '
 'relationship with the narrator, who reflects on his own life and the lives '
 'of those around him, including the wealthy and powerful Tom Buchanan. Set in '
 'the 1920s, the novel explores themes of love, wealth, and the American '
 'Dream. The narrator also discusses his experiences in New York and his final '
 'encounter with Gatsby before leaving. The ebook can be distributed and '
 'accessed in various formats, but must include the full Project Gutenberg '
 'License. Donations to the non-profit organization are appreciated to '
 'continue providing free electronic works. ')


In [None]:
('\n'
 '\n'
 '"The Great Gatsby" by F. Scott Fitzgerald is a free ebook available through '
 "Project Gutenberg. It tells the story of Gatsby's love for Daisy and his "
 'mysterious past, as observed by the narrator Nick. The passage also '
 "discusses Project Gutenberg's mission and terms for using and distributing "
 'free electronic works. ')