##How can a text be submitted to an LLM to be summarized?

This project uses [OpenAI's GPT 3.5 Turbo Instruct](https://platform.openai.com/docs/models/gpt-3-5-turbo), which has a maximum of 4,096 input tokens.

#Setup

In [None]:
# download & install libraries
!pip --quiet install openai==1.10.0 # LLM
!pip --quiet install langchain==0.1.4 # framework for working with LLMs
!pip --quiet install pypdf==3.10.0 # load PDFs for LangChain processing

To run this project you'll need an OpenAI account, which you will use to obtain an OpenAI API key.  Keep your API key secret.

While creating these lessons, I loaded my account with a \$5.00 credit and turned OFF Auto Recharge.  This means that when the balance reaches \$0, requests using this API key will stop working.  While creating these lessons, I ended up using only $0.43 in total.

*   [Directions for obtaining an OpenAI API Key](https://help.openai.com/en/)
*   [Create/manage your OpenAI API keys](https://platform.openai.com/api-keys)
*   [Track your OpenAI API key usage/costs](https://platform.openai.com/usage)

Example (not a real key):
> %env OPENAI_API_KEY=uy-test-sdf87ewrkhjcdsoiewe2DSFIIF234234jhk23rHJJKH323jk



In [None]:
# paste your OpenAI API key without quotation marks
%env OPENAI_API_KEY=

#Summarizing a String

In [None]:
# import LangChain
from langchain import OpenAI, PromptTemplate
from langchain.chains import LLMChain
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader

In [None]:
# initialize the large language model
# will not run without providing an OPENAI_API_KEY above

# temperature set to 0 for predictable output
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)

In [None]:
# initialize a PromptTemplate with 2 parameters:
#     input_variables – list of value types to use while prompting
#     template – text prompt that includes input_variables

#summarization_template = "Summarize the following text to one sentence: {text}"
#summarization_prompt = PromptTemplate(
#    input_variables = ["text"],
#    template = summarization_template
#    )


summarization_prompt = PromptTemplate(
    input_variables = ["text"],
    template = "Summarize the following text to one sentence: {text}"
    )

In [None]:
# assemble a chain that includes both llm & prompt
summarization_chain = LLMChain(llm=llm, prompt=summarization_prompt)

In [None]:
# text to summarize (first 4 paragraphs of Wikipedia's "Large language model" page, 7/23/2024)
sentence1 = "A large language model (LLM) is a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification."
sentence2 = " Based on language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a computationally intensive self-supervised and semi-supervised training process."
sentence3 = " LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word."
text = sentence1 + sentence2 + sentence3

In [None]:
# prompt the LLM
summarized_text = summarization_chain.predict(text=text)
print(summarized_text)


# TINKER:

# 1) Replace sentence1, sentence2 and sentence3 with alternate sentences, then
#     rerun to observe the different output.
#     Example:
#          # text to summarize (first paragraph of Narrative of the Life of Frederick Douglass)
#          text = "I was born in Tuckahoe, near Hillsborough, and about twelve miles from Easton, in Talbot county, Maryland. I have no accurate knowledge of my age, never having seen any authentic record containing it. By far the larger part of the slaves know as little of their ages as horses know of theirs, and it is the wish of most masters within my knowledge to keep their slaves thus ignorant. I do not remember to have ever met a slave who could tell of his birthday. They seldom come nearer to it than planting-time, harvest-time, cherry-time, spring-time, or fall-time. A want of information concerning my own was a source of unhappiness to me even during childhood. The white children could tell their ages. I could not tell why I ought to be deprived of the same privilege. I was not allowed to make any inquiries of my master concerning it. He deemed all such inquiries on the part of a slave improper and impertinent, and evidence of a restless spirit. The nearest estimate I can give makes me now between twenty-seven and twenty-eight years of age. I come to this, from hearing my master say, some time during 1835, I was about seventeen years old."

# 2) Predict what will happen if you submit a text too long for GPT 3.5 Turbo
#      Instruct's input limit of 4,096 tokens.  Submit a very long text and
#      observe the result.

Summarizing a PDF using PyPDF
---

In [None]:
# assemble a chain that includes the llm
summarize_chain = load_summarize_chain(llm)

In [None]:
# use PyPDF to load the PDF

# this URL points to "Data Responsibility in Humanitarian Action", a public
# PDF accessiblbe via The Humanitarian Data Exchange at data.humdata.org
document_loader = PyPDFLoader(file_path="https://data.humdata.org/dataset/2048a947-5714-4220-905b-e662cbcd14c8/resource/77866cc2-d5eb-42c6-995c-75ec9ff02ca8/download/data-responsibility-in-humanitarian-action.pdf")
document = document_loader.load()

In [None]:
# summarize the PDF
summary = summarize_chain(document)
print(summary['output_text'])


# TINKER:

# 1) Load your own PDF files and observe how the LLM summarizes them.
# You can search for publicly available files online.
# An online PDF file can be loaded directly by URL.
# Note that a long PDF might exceed the LLM's token limit.

# 2) Change the LLM temperature and re-run several times to observe how it
#     changes the summary.