<a href="https://colab.research.google.com/github/pschorey/Valpo_IT533/blob/main/IT_533_LargeLanguageModels2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. What are Large Language Models?**
Large Language Models (LLMs) are artificial intelligence systems that understand and generate human-like text. They primarily utilize a type of Neural Network architecture called **Transformers**. Transformers have revolutionized natural language processing tasks and have been instrumental in the development of powerful LLMs like GPT-3 (Generative Pre-trained Transformer 3).

##**What are Transformer Networks?**

The key feature of Transformers is their ability to capture the relationships and dependencies between words or tokens in a sequence. They achieve this through a mechanism called self-attention. Self-attention allows each word in the input sequence to attend to and weigh its importance with respect to other words, enabling the model to effectively capture long-range dependencies.

Transformers consist of an encoder-decoder structure, but in the case of LLMs, typically only the encoder part is used. The encoder takes in a sequence of words as input and processes them in parallel, capturing the contextual information of each word based on the surrounding words. This contextual information is then used to generate predictions or generate text. 

We have already learned about Neural Networks in this course, and you have watched the instructor video on [classic NLP concepts](https://www.youtube.com/watch?v=BRIEHhcCi_8). The image below shows how Transformers unite both concepts. 

<img src = "https://miro.medium.com/v2/resize:fit:720/format:webp/1*57LYNxwBGcCFFhkOCSnJ3g.png">

This isn't just ONE Neural Network now; it's several of them in sequence. You might already recognize a number of familiar terms, such as Softmax in the Output layer, the "Add & Norm" steps as the summarization and activation functions in a regular neuron, and the feed-forward mechanism. To step through this transformer network graphic piece by piece, take a look at [this article](https://towardsdatascience.com/transformer-neural-network-step-by-step-breakdown-of-the-beast-b3e096dc857f) and watch the video below.



In [None]:
from IPython.display import HTML # This is just for me so I can embed videos
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/MQnJZuBGmSQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')



##**How Do Transformer Networks Know What They Know?**

LLMs are trained on massive amounts of text data, like books and articles, to learn the patterns and structures of human language. By analyzing this data, LLMs can generate coherent and contextually relevant text based on the input they receive.

Transformer networks are typically trained using a **two-step process**: pre-training and fine-tuning. This process involves exposing the model to a large amount of text data and optimizing its parameters to learn the statistical patterns and relationships within the data.

**1. Pre-training:**

During pre-training, the Transformer network is trained on a massive corpus of unlabeled text data. The objective is to develop a general understanding of language and capture the underlying patterns. The training data can include books, articles, websites, and other sources of text. 

The pre-training process involves predicting 
* Missing or masked words within sentences. The model is presented with a sequence of words, and a portion of those words is randomly replaced with special "mask" tokens. The model's task is to predict the original words based on the context provided by the surrounding words. This task is known as "**masked language modeling**."
* The next word in a sequence, which is known as "**next word prediction**." By training on these objectives, the  model learns to capture the statistical regularities and relationships in the text data.

**2. Fine-tuning:**

After pre-training, the Transformer network is further refined through a process called fine-tuning. Fine-tuning involves training the model on a more specific, labeled dataset that is relevant to the target task. This dataset is typically smaller and tailored to a specific application, such as machine translation or text classification.

During fine-tuning, the model's parameters are adjusted based on the labeled data using techniques like backpropagation and [gradient descent](https://www.ibm.com/topics/gradient-descent). The objective is to optimize the model's performance on the specific task by minimizing a defined loss or error function.

Fine-tuning allows the model to adapt its pre-learned language understanding to the specific nuances and requirements of the target task. It fine-tunes the model's weights and biases to make more accurate predictions or generate relevant text in the desired domain.

**3. Training Data:**

The availability and quality of labeled data play a crucial role in the fine-tuning process, and the quality of initial training data determine the reliability of the entire model. The model's performance can significantly improve with a large and diverse labeled dataset that is representative of the task at hand.

**NOTE**: It is always important to be aware of the **SCOPE AND QUALITY** of training data used. As for **scope**, for example, the last major training event for ChatGPT occurred in 2021, so ChatGPT will not command any data and events after 2021 with reliable confidence. This also includes any biases with which the training data was assembled. Imagine images of all Presidents of the United States in a training data set; if we were going to ask for a picture of the next President of the United States, which race would any Transformer Network most likely predict? As for **quality**, note that, as with other neural networks, Garbage In means Garbage Out. 
<center>
<img src = "https://i0.wp.com/marketbusinessnews.com/wp-content/uploads/2017/11/GIGO-garbage-in-garbage-out-definiion-and-illustration.jpg?w=904&ssl=1" width = 400>
</center>

In summary, Transformer networks are trained through a two-step process: pre-training and fine-tuning. Pre-training involves exposing the model to a large corpus of unlabeled text data and training it to predict masked words and the next word in a sequence. Fine-tuning involves further training the model on a labeled dataset specific to the target task, refining its parameters and adapting its language understanding to the task's requirements. The most important thing to keep in mind is the power of GIGO--garbage in, garbage out.

#**2. How to Use ChatGPT with your Colab Notebook**

While clear and precise query engineering is a really big opportunity to obtain optimal results from ChatGPT, the ChatGPT interface isn't always directly available to you or your users, or you may want to include it into a chatbot or an automated telephone tree. Here is how to access it through an API.

To start, please take a look around https://platform.openai.com/overview. 

In [1]:
# First, we need to install and then call the libraries
!pip install openai

import openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

To connect to the OpenAI API, you will need to set up your user credentials. This means setting up your API Key. Here is how to do this:
1. Go to https://platform.openai.com/examples
2. Sign in with your ChatGPT account
3. Follow the steps in the video below. The code sample is underneath the video.

In [None]:
from IPython.display import HTML # This is just for me so I can embed videos
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/AovPXtUN9j8?hd=1" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')



In [3]:
# Here is my code
openai.api_key = '' #Enter here your own API Key between the single quotes

# Define a function to send a message and receive a response from ChatGPT
def chat_with_gpt(prompt):
    response = openai.Completion.create(
        engine='text-davinci-003',  # Choose the language model, e.g., 'text-davinci-003' or 'text-davinci'
        prompt=prompt,
        max_tokens=50,  # Adjust the length of the response
        temperature=0.7,  # Adjust the randomness of the response, higher values for more randomness
        n=1,  # Number of responses to generate
        stop=None,  # Specify a stop sequence to end the response, e.g., '###'
        timeout=15  # Specify a timeout value (in seconds) to limit the API call duration
    )
    
    # Extract and return the generated response
    return response.choices[0].text.strip()

# Example usage
prompt = "What is the capital of France?"
response = chat_with_gpt(prompt)
print(response)


RateLimitError: ignored

In [4]:
# Here is the code from Ronnie Sheer's video
import os
import openai

openai.api_key = '' #Enter your API Key between the single quotes

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="Summarize this for a second-grade student:\n\nJupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.",
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
print(response)

AuthenticationError: ignored