# Introduction to Large Language Models
* **Created by:** Eric Martinez
* **For:** 3351 - AI-Powered Applications
* **At:** University of Texas Rio-Grande Valley

In [31]:
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env

True

Install Dependencies

In [None]:
%pip install -U --quiet openai

Create an OpenAI client

In [11]:
import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ.get("OPENAI_API_BASE"),
    api_key=os.environ.get("OPENAI_API_KEY"),
)

def completion(prompt, max_tokens=None, temperature=0):
    completion = client.completions.create(
      model="gpt-3.5-turbo-instruct",
      prompt=prompt,
      max_tokens=max_tokens,
      temperature=temperature
    )
    return completion.choices[0].text.strip()

def chat_completion(message, model="gpt-3.5-turbo", prompt=None, temperature=0):
    # Initialize the messages list
    messages = []
    
    # Add the prompt to the messages list
    if prompt is not None:
        messages += [{"role": "system", "content": prompt}]
    
    # Add the user's message to the messages list
    messages += [{"role": "user", "content": message}]
    
    # Make an API call to the OpenAI ChatCompletion endpoint with the model and messages
    completion = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature
    )
    
    # Extract and return the AI's response from the API response
    return completion.choices[0].message.content.strip()

## Overview of LLMs and their capabilities

An LLM is a machine learning model designed to understand and generate human-like text. They are trained on vast amounts of text data and have been found to be able to perform a wide range of tasks, such as translation, summarization, and question-answering.

We don't quite know the full extent of the capabilities or limitations of large language models.

Everyday new discoveries are made.

## Let's explore 'completion' models

**Example:** A bloody space pirate parachuted down from

Let's mess around in the OpenAI Playground
- discuss 'sampling' from a probability distribution
- relationship between temperature, logprobs, output

## How LLMs understand text (Tokenization)

Example: A bloody space pirate parachuted down from

Let's mess around in the OpenAI Playground

Tokenization

In [12]:
from tiktoken import encoding_for_model
enc = encoding_for_model("gpt-3.5-turbo-instruct")
toks = enc.encode("A bloody space pirate parachuted down from")
toks

[32, 36277, 3634, 55066, 75045, 2844, 1523, 505]

In [13]:
[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]

['A', ' bloody', ' space', ' pirate', ' parach', 'uted', ' down', ' from']

Next-Token Prediction

In [14]:
print(completion("A bloody space pirate parachuted down from"))

the sky, landing in front of the group. He was a tall, muscular


## Are LLMs just fancy autocomplete?

In [15]:
prompt = "The quick brown fox"

print(completion(prompt))

jumps over the lazy dog


The quick brown fox jumps over the lazy dog


In [16]:
prompt = "We wish you a merry"

print(completion(prompt))

Christmas

We wish you a merry Christmas

We wish you a merry Christmas


In [21]:
prompt = """
Chicken coop design is the discipline of
"""

print(completion(prompt))

planning and creating a structure that is suitable for housing and raising chickens. It


## Power of Next-Token Prediction

In [22]:
prompt = """
Tweet: Decided to play with the new arduino for the first time - sooo easy to get started Now having fun with code and flashy lights!	
Sentiment: Positive

Tweet: Hoping I at least have fun 2nite. Today was 1 horrible way 2 start off a birthday
Sentiment: Negative

Tweet: my poor puppy is a litle depress	
Sentiment: Negative

Tweet: Goodnight and sweet dreams to you also
Sentiment: Positive

Tweet: Wish today was over
""" 
print(completion(prompt))

Sentiment: Negative


In [23]:
prompt = """
# Ruby

people = ["Bob", "Sally"]

# Loop over each person and greet them
"""
print(completion(prompt))

people.each do |person|
  puts "Hello, #{person}!"
end


Ilya Sutskever (OpenAI) on Next-Token Prediction

https://www.youtube.com/watch?v=kZ-e_WtxP64

## Naive Next-Token Prediction Models (Base Models)
- **Task:** Given some random piece of text from the internet, predict what is likely to come next
- **Benefits:** Learns a lot about the world in the process
- **Downsides:** Isn't suited to performing tasks, interacting with humans, factuality, harmlessness

## Beyond Naive Next-Token Prediction
- Start at base model
- Fine-tune the model to follow instructions, answer according to human preferences, minimize harm
- Led to GPT-3.5 and GPT-4
- Tend to be called "Chat" models

## Interacting with Chat Models via OpenAI API

#### Power of 'Chat' fine-tuning

In [24]:
prompt = "You will be handed a tweet, classify the tweet as 'Positive' or 'Negative'"
message = "Wish today was over"

print(chat_completion(message, prompt=prompt))

Negative


In [25]:
prompt = "You are a helpful programming assistant."
message = "Write a Ruby program that makes an array with two names, Bob and Sally. Then it should greet each one of them."

print(chat_completion(message, prompt=prompt))

Here's a Ruby program that creates an array with two names, Bob and Sally, and greets each one of them:

```ruby
names = ["Bob", "Sally"]

names.each do |name|
  puts "Hello, #{name}!"
end
```

When you run this program, it will output:

```
Hello, Bob!
Hello, Sally!
```

This program uses the `each` method to iterate over each element in the `names` array. Inside the loop, it prints a greeting message using string interpolation to include the current name.


In [26]:
prompt = """
You are a helpful programming assistant. 
Never provide a narrative response. 
Only provide the code the user asks for.
"""

message = "Write a Ruby program that makes an array with two names, Bob and Sally. Then it should greet each one of them."

print(chat_completion(message, prompt=prompt))

```ruby
names = ["Bob", "Sally"]

names.each do |name|
  puts "Hello, #{name}!"
end
```


## What are LLMs good at?

- Writing
- Editing
- Coding
- Extracting information from text
- Summarization
- Extrapolation
- Reasoning
- Learning from examples
- Interactions with humans

## Rule of Thumb

If you can write down the instructions such that a human assistant can do it, then an LLM can probably do it.

## Implications of the Rule of Thumb

- You have to be able to clearly communicate the problem, context, and task with clear guidelines on input/output combinations, formatting, constraints, and examples.
- If you are a bad communicator, you will not get good results.
- Garbage-in / Garbage-out
- You will need to be clever at finding ways to represent your problem and task as text

## Challenges with Current LLMs
- Tasks and knowledge not presented in its training data
- Hallucination
- Reading your mind
- Following poor instructions
- Knowledge cut-off

## What does GPT-4 know?

- USMLE Self-Assessment (Overall Average): 86.65
- USMLE Sample Exam: 86.7
- Uniform Bar Exam: 298/400 (~90th)
- GRE Quant: 157/170 (~62th)
- GRE Verbal: 165 / 170 (96th)
- GRE Writing: 4 / 6 (~54th)

## How might GPT-4 have been trained?

According to leaks and rumors:
    
- Estimated cost to train: ~$63M
- Trained on data from the raw internet (controversial)
    * Large amount of scraped data
    * Books, journals, and scholarly articles
    * All public Github repos
- Refined from dataset produced by human annotators

## Goals of this course

- Learn to design and develop applications that leverage large language models
- Learn to apply software engineering practices to these types of applications to ensure production-readiness
- Learn to evaluate and validate AI applications
- Learn to apply the concepts from CI/CD to continuously validate AI applications in production
- Learn to build datasets for the purpose of evaluating and validating AI applications
- Learn how to choose metrics from evaluation and develop new ones
- Learn how to apply ideas from agile for working with stakeholders and managing AI engineering projects
- Learn to apply key ideas to build _good_ AI applications for a variety of domains

## Basic Example: Building a Tweet Classification App

In [27]:
def tweet_classifier(tweet):
    prompt = "You will be handed a tweet, classify the tweet as 'Positive', 'Negative', or 'Neutral'"

    return chat_completion(tweet, prompt=prompt)

In [29]:
import gradio as gr

with gr.Blocks() as demo:
    gr.Markdown("# Tweet Classifier")
    with gr.Row():
        with gr.Column():
            tweet = gr.Textbox(label="Tweet")
        with gr.Column():
            sentiment = gr.Textbox(label="Sentiment", interactive=False)
    with gr.Row():
        btn = gr.Button(value ="Submit")
        btn.click(tweet_classifier, inputs = [tweet], outputs = [sentiment])
    demo.launch(share=True)

Running on local URL:  http://127.0.0.1:7861

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
