# Summarize pdf content using OpenAI

## Set openai api key

In [1]:
import getpass

openai_api_key = getpass.getpass("OpenAI API key: ")

OpenAI API key:  ········


In [3]:
import os

os.environ["OPENAI_API_KEY"] = openai_api_key

In [4]:
!pip install -qqq langchain openai

In [6]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003")

In [7]:
llm("hello, this is a test")

'\n\nHello there! It looks like you are trying out a test. How is it going so far?'

In [8]:
from IPython.display import Markdown

In [9]:
Markdown(llm("hello, this is a test"))



This is a great test! It looks like you are trying to determine if the message is readable. If so, congratulations! The message is readable and understandable.

In [15]:
%pip install -qqq pypdf

Note: you may need to restart the kernel to use updated packages.


## Read pdf content

In [16]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("./embeddings.pdf")
pages = loader.load_and_split()

In [21]:
len(pages)

82

In [37]:
content = "\n\n".join([page.page_content for page in pages[3:6]])

In [38]:
Markdown(content)

1 Introduction
Implementing deep learning models has become an increasingly important
machine learning strategy1for companies looking to build data-driven prod-
ucts. In order to build and power deep learning models, companies collect and
feed hundreds of millions of terabytes of multimodal2data into deep learning
models. As a result, embeddings — deep learning models’ internal represen-
tations of their input data — are quickly becoming a critical component of
building machine learning systems.
For example, they make up a significant part of Spotify’s item recom-
mender systems [ 27], YouTube video recommendations of what to watch [ 11],
and Pinterest’s visual search [ 31]. Even if they are not explicitly presented
to the user through recommendation system UIs, embeddings are also used
internally at places like Netflix to make content decisions around which shows
to develop based on user preference popularity.
Figure 1: Left to right: Products that use embeddings used to generate recommended items:
Spotify Radio, YouTube Video recommendations, visual recommendations at Pinterest, BERT
Embeddings in suggested Google search results
The usage of embeddings to generate compressed, context-specific repre-
sentations of content exploded in popularity after the publication of Google’s
Word2Vec paper [47].
1Check out the machine learning industrial view Matt Turck puts together every year, which
has exploded in size.
2Multimodal means a variety of data usually including text, video, audio, and more recently
as shown in Meta’s ImageBind, depth, thermal, and IMU.
4

Figure 2: Embeddings papers in Arxiv by month. It’s interesting to note the decline in
frequency of embeddings-specific papers, possibly in tandem with the rise of deep learning
architectures like GPT source
Building and expanding on the concepts in Word2Vec, the Transformer
[66] architecture, with its self-attention mechanism, a much more specialized
case of calculating context around a given word, has become the de-facto
way to learn representations of growing multimodal vocabularies, and its rise
in popularity both in academia and in industry has caused embeddings to
become a staple of deep learning workflows.
However, the concept of embeddings can be elusive because they’re neither
data flow inputs or output results - they are intermediate elements that live
within machine learning services to refine models. So it’s helpful to define
them explicitly from the beginning.
As a general definition, embeddings are data that has been transformed
into n-dimensional matrices for use in deep learning computations. The
process of embedding (as a verb):
•Transforms multimodal input into representations that are easier to
perform intensive computation on, in the form of vectors , tensors, or
graphs [ 51]. For the purpose of machine learning, we can think of
vectors as a list (or array) of numbers.
•Compresses input information for use in a machine learning task — the
type of methods available to us in machine learning to solve specific
problems — such as summarizing a document or identifying tags or
labels for social media posts or performing semantic search on a large
text corpus. The process of compression changes variable feature
dimensions into fixed inputs, allowing them to be passed efficiently
into downstream components of machine learning systems.
•Creates an embedding space that is specific to the data the embeddings
were trained on but that, in the case of deep learning representations,
can also generalize to other tasks and domains through transfer
learning — the ability to switch contexts — which is one of the
reasons embeddings have exploded in popularity across machine
learning applications
5

What do embeddings actually look like? Here is one single embedding,
also called a vector , in three dimensions . We can think of this as a repre-
sentation of a single element in our dataset. For example, this hypothetical
embedding represents a single word "fly", in three dimensions. Generally, we
represent individual embeddings as row vectors.

1 4 9
(1)
And here is a tensor , also known as a matrix3, which is a multidimensional
combination of vector representations of multiple elements. For example, this
could be the representation of "fly", and "bird."
1 4 9
4 5 6
(2)
These embeddings are the output of the process of learning embeddings,
which we do by passing raw input data into a machine learning model. We
transform that multidimensional input data by compressing it, through the
algorithms we discuss in this paper, into a lower-dimensional space. The
result is a set of vectors in an embedding space.
Word
Sentence
ImageMultimodal data
[1, 4, 9 ]
[1, 4, 7 ]
[12, 0, 3 ]Embedding Space
Algorithm
Figure 3: The process of embedding.
We often talk about item embeddings being in Xdimensions, ranging
anywhere from 100 to 1000, with diminishing returns in usefulness somewhere
beyond 200-300 in the context of using them for machine learning problems4.
This means that each item (image, song, word, etc) is represented by a vector
of length X, where each value is a coordinate in an X-dimensional space.
We just made up an embedding for "bird", but let’s take a look at what a
real one for the word "hold" would look like in the quote, as generated by the
BERT deep learning model,
"Hold fast to dreams, for if dreams die, life is a broken-winged bird that
cannot fly." — Langston Hughes
We’ve highlighted this quote because we’ll be working with this sentence
as our input example throughout this text.
3The difference between a matrix and a tensor is that it’s a matrix if you’re doing linear
algebra and a tensor if you’re an AI researcher.
4Embedding size is tunable as a hyperparameter but so far there have only been a few
papers on optimal embedding size, with most of the size of embeddings set through magic and
guesswork
6

## Summarize above content

In [39]:
response = llm(f"""Summarize below content:
{content}""")

Markdown(response)



Embeddings are a way of transforming data (e.g. text, images, audio, etc) into numerical representations that can be used for machine learning algorithms. Embeddings are often represented as vectors, tensors, or graphs, and are used to compress input information and create an embedding space which is specific to the data they have been trained on. They can also generalize to other tasks and domains using transfer learning. Embeddings are typically set to a size of 100-1000, where each value is a coordinate in a multidimensional space.