# Part 2: How to Tame an LLM
## **A Guide to Using LLMs Practically**

### How do LLMs really work?
- This is a useful thought experiment. ([Sidenote: This is where it all started](https://arxiv.org/abs/1706.03762)).

- Turns out they're fairly similar to us in some regards (long-term & short-term 'memory', ability to '[pay attention](https://arxiv.org/pdf/2307.03172.pdf)').
- Where possible, put LLMs in position to use short-term memory and help them pay attention.
- LLMs are great at [pattern matching](https://arxiv.org/abs/2005.14165) and following syntactic rules.
- There are cases where LLMs provide a solution to a problem, but [they may be suboptimal](https://aclanthology.org/2023.findings-acl.426.pdf).

### Retrieval Augmented Generation (RAG) - An Anti-Hallucination Antidote
- One such way of limiting the use of LLMs to what they are best at.

- Uses [in-context learning](https://arxiv.org/abs/2301.00234) to give the LLM a usable short-term memory.

Let's use this to ask questions about some lecture notes
### 1 - Reading my PDF in Python



In [1]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
                                              0.0/232.6 kB ? eta -:--:--
     -------------------------              153.6/232.6 kB 3.1 MB/s eta 0:00:01
     -------------------------------------- 232.6/232.6 kB 2.9 MB/s eta 0:00:00
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [8]:
from PyPDF2 import PdfReader
reader = PdfReader("Lecture Notes.pdf")

# Read each page and store them as a string
lecture_notes  = ''.join([page.extract_text() for page in reader.pages])

In [5]:
import openai
import os
from dotenv import load_dotenv
load_dotenv()

# Load our OpenAI API key
openai.api_key = os.getenv("api_key")

### Augmented Generation - RAG's little brother

In [7]:
def ask_query (query, context):

    # Tell the LLM to only use the data we give it
    guide_prompt = fr"""Use only the following context to answer the query at the end: 

    Context: 
    {context}

    Query:
    {query}
    """
    messages = [{'role':'user', 'content': guide_prompt}]

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=100
    )

    response_content = response.choices[0].message.content

    return(response_content)

ask_query(query = "What is an objective function?", context = lecture_notes*2)

InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 7252 tokens. Please reduce the length of the messages.

How can we make this augmented approach more scalable?

In [27]:
# Split by sentence (roughly)
sentences = lecture_notes.split('. ')

# Print the lecture notes split by sentence
print ('\n\n--------- Sentence Break --------- \n\n'.join(sentences))

2 CHAPTER 1

--------- Sentence Break --------- 

INTRODUCTION
1.1 Introduction
Optimization is the act of achieving the best possible resul t under given circumstances.
In design, construction, maintenance, ..., engineers have to take decisions

--------- Sentence Break --------- 

The goal of all
such decisions is either to minimize eﬀort or to maximize bene ﬁt.
The eﬀort or the beneﬁt can be usually expressed as a function o f certain design variables.
Hence, optimization is the process of ﬁnding the conditions that give the maximum or the
minimum value of a function.
It is obvious that if a point x⋆corresponds to the minimum value of a function f(x), the
same point corresponds to the maximum value of the function −f(x)

--------- Sentence Break --------- 

Thus, optimization
can be taken to be minimization.
Thereis nosinglemethodavailable for solvingall optimiza tion problemseﬃciently

--------- Sentence Break --------- 

Hence,
a number of methods have been developed for solving d

### 2 - Converting our lecture notes into numbers

- We understand letters, LLMs understand numbers.

- Let's use a core natural langauge processing (NLP) technique known as text **'embedding'** (another word for representing text as a numerical vector)

In [31]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.4.15-py3-none-any.whl (479 kB)
                                              0.0/479.8 kB ? eta -:--:--
     -----------                            143.4/479.8 kB 4.2 MB/s eta 0:00:01
     ----------------------                 286.7/479.8 kB 3.5 MB/s eta 0:00:01
     ------------------------------------   460.8/479.8 kB 3.6 MB/s eta 0:00:01
     -------------------------------------- 479.8/479.8 kB 3.3 MB/s eta 0:00:00
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-win_amd64.whl (150 kB)
                                              0.0/150.6 kB ? eta -:--:--
     -------------------------------        122.9/150.6 kB 3.6 MB/s eta 0:00:01
     -------------------------------------- 150.6/150.6 kB 3.0 MB/s eta 0:00:00
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.104.1-py3-none-any.whl (92 kB)
                                              0.0/92.9 kB ? eta -:--:--
     ----

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.10.0 requires keras-preprocessing>=1.1.1, which is not installed.
tensorflow 2.10.0 requires libclang>=13.0.0, which is not installed.
tensorflow 2.10.0 requires tensorflow-io-gcs-filesystem>=0.23.1, which is not installed.
tensorflow 2.10.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.


In [10]:
import chromadb

# Initiallize a vector store to store our text and their respective embeddings
chroma_client = chromadb.Client()
vector_store = chroma_client.create_collection(name="lecture_notes")

In [11]:
# Add our sentences into the vector store (this also creates their vector embeddings behind the scenes)
vector_store.add(
        documents=sentences,
        ids = [f"id{sentence_num}" for sentence_num, sentence in enumerate (sentences)]
)

In [9]:
# Querying against our our own lecture notes in the vector store to get the most similar sentences to our query
vector_store.query(
    query_texts=["What is an objective function?"],
    n_results=5
)

NameError: name 'vector_store' is not defined

### 3 - Constraining our LLM with only lecture notes

In [49]:
def ask_query (query):

    # Get the most relevant sentences to our query
    context = vector_store.query(
    query_texts=[query],
    n_results=5
    )
    context_list = context['documents'][0]
    context_string = '\n'.join(context_list)

    # Tell the LLM to only use the data we give it
    guide_prompt = fr"""Use only the following context to answer the query at the end: 

    Context: 
    {context_string}

    Query:
    {query}
    """
    messages = [{'role':'user', 'content': guide_prompt}]

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=100
    )

    response_content = response.choices[0].message.content

    # Give an output alongside sources
    output = f'Answer:\n\n{response_content}\n\nSources:\n\n{context_string}'
    return(output)

In [50]:
print(ask_query("What is an objective function?"))

Answer

An objective function is a function that is expressed in terms of design variables and represents the goal of either minimizing effort or maximizing benefit in decision-making. It is used in the process of optimization to find the conditions that result in the maximum or minimum value of the function.

Sources:

This criterion, wh en expressed as a function of
the design variables, is known as objective function
The goal of all
such decisions is either to minimize eﬀort or to maximize bene ﬁt.
The eﬀort or the beneﬁt can be usually expressed as a function o f certain design variables.
Hence, optimization is the process of ﬁnding the conditions that give the maximum or the
minimum value of a function.
It is obvious that if a point x⋆corresponds to the minimum value of a function f(x), the
same point corresponds to the maximum value of the function −f(x)
Howeve r, the selection of an objective
functionisnottrivial, becausewhatistheoptimal designw ithrespecttoacertaincriterion
may