#GPT2 and ChatGPT, the Origins
copyright 2023 Denis Rothman, MIT License

**September 10,2023 update**    
davinci-instruct was updated to davinci-002  
Note: The reponses GPT models are stochastic so the reponses may vary from one run to another

Conversational transformer driven-chatbots are progressing.
GPT-2 was a good start. But now OpenAI (and other editors) have produced ChatGPT which is built on top of OpenAI's [instruct series](https://openai.com/blog/chatgpt/), [InstructGPT](https://openai.com/blog/chatgpt/).

This notebook will take you from GPT-2 to the ever-evolving world of conversation chatbots and show you an example of how you can implement one:<br>
**1.Text generation with a Hugging Face GPT-2 model**<br>
**2.ChatGPT(GPT-3), the Origins**<br>
**3.Next Steps**




#1.GPT-2


## 1.1.Source code for transformers.modeling_gpt2

[Hugging Face and OpenAI provided a source code version for those interested in the code of transformers.](https://huggingface.co/transformers/v3.5.1/_modules/transformers/modeling_gpt2.html)

You can read the code along with Transformers for NLP, 2nd Edition:<br>
- Chapter 2 Getting Started with the Architecture of the Transformer Model
- Chapter 7 The Rise of Suprahuman Transformers with GPT-3 Engines. 

Although we don't have access to the source code of GPT-3, the GPT-2 source code can give you an idea of how transformers are built.

Below is an example of the attention class code.

In [1]:
'''
class Attention(nn.Module):
    def __init__(self, nx, n_ctx, config, scale=False, is_cross_attention=False):
        super().__init__()

        n_state = nx  # in Attention: n_state=768 (nx=n_embd)
        # [switch nx => n_state from Block to Attention to keep identical to TF implem]
        assert n_state % config.n_head == 0
        self.register_buffer(
            "bias", torch.tril(torch.ones((n_ctx, n_ctx), dtype=torch.uint8)).view(1, 1, n_ctx, n_ctx)
        )
        self.register_buffer("masked_bias", torch.tensor(-1e4))
        self.n_head = config.n_head
        self.split_size = n_state
        self.scale = scale
        self.is_cross_attention = is_cross_attention
        if self.is_cross_attention:
            self.c_attn = Conv1D(2 * n_state, nx)
            self.q_attn = Conv1D(n_state, nx)
        else:
            self.c_attn = Conv1D(3 * n_state, nx)
        self.c_proj = Conv1D(n_state, nx)
        self.attn_dropout = nn.Dropout(config.attn_pdrop)
        self.resid_dropout = nn.Dropout(config.resid_pdrop)
        self.pruned_heads = set()
'''

'\nclass Attention(nn.Module):\n    def __init__(self, nx, n_ctx, config, scale=False, is_cross_attention=False):\n        super().__init__()\n\n        n_state = nx  # in Attention: n_state=768 (nx=n_embd)\n        # [switch nx => n_state from Block to Attention to keep identical to TF implem]\n        assert n_state % config.n_head == 0\n        self.register_buffer(\n            "bias", torch.tril(torch.ones((n_ctx, n_ctx), dtype=torch.uint8)).view(1, 1, n_ctx, n_ctx)\n        )\n        self.register_buffer("masked_bias", torch.tensor(-1e4))\n        self.n_head = config.n_head\n        self.split_size = n_state\n        self.scale = scale\n        self.is_cross_attention = is_cross_attention\n        if self.is_cross_attention:\n            self.c_attn = Conv1D(2 * n_state, nx)\n            self.q_attn = Conv1D(n_state, nx)\n        else:\n            self.c_attn = Conv1D(3 * n_state, nx)\n        self.c_proj = Conv1D(n_state, nx)\n        self.attn_dropout = nn.Dropout(config.att

##1.2.Hugging Face GPT-2 Model

The [Hugging Face GPT2 Model](https://huggingface.co/gpt2) is an interesting transformer you can run for text generation. 

ChatGPT generates text as well as you can see in section 2 of this notebook. You can compare the outputs of GPT-2 and the ones obtained with GPT-3 in section 2.



### 1.2.1. Text Generation

The same prompt, "Summarize the history of the Roman Empire" was used for sections 2(GPT-2) and 3(GPT-3). Naturally, the results depend on the datasets. But still, GPT-3 provides better-structured outputs.

In [2]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m38.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m56.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.12.0 tokenizers-0.13.2 transformers-4.26.1


In [3]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("Summarize the history of the Roman Empire:", max_length=300, num_return_sequences=2)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Summarize the history of the Roman Empire: The first recorded war between Rome and Alexandria between 850 and 550 CE. In what is now called the Wars of the Roses and the Catan, Hannibal defeated the Carthaginian defenders. Carthaginian forces stormed Constantinople, taking back the city from Rome on Sept. 2, 550, which is said to have happened after the victory speech by Emperor Justinolaus.'},
 {'generated_text': 'Summarize the history of the Roman Empire: What was the origins of the empire and how did its early leaders survive? The emperor Marcus Aurelius, known as Caligula, succeeded Emperor Scipio as son of Augustus (1635-1640). Like his father, Scipio\'s ambition was to build his empire from the ruins of the ancient Roman cities (the Valley of Carpathians, for example); but he had to compromise his religious beliefs so that his empire would survive. The first step toward the creation of the Roman Empire was mass tourism, when the Romans celebrated the birth an

### 1.2.2.Features in PyTorch

It is interesting to dive into the features of text in PyTorch to see how powerful GPT models are.

In [4]:
from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Summarize the history of the Roman Empire:"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
print(output)

BaseModelOutputWithPastAndCrossAttentions(last_hidden_state=tensor([[[-0.1643, -0.0088, -0.3804,  ..., -0.1008, -0.1826, -0.1549],
         [-0.5567,  0.3165, -0.5420,  ..., -0.0316,  0.1112, -0.2285],
         [-0.1847, -0.1801, -0.7099,  ..., -0.1133,  0.4296,  0.1351],
         ...,
         [-0.5366, -0.1843, -0.0568,  ..., -0.2162, -0.3745, -0.0671],
         [-0.4827, -0.3048, -0.5157,  ...,  0.1181, -0.3663, -0.3136],
         [-0.2303, -0.0685, -0.3972,  ...,  0.1381,  0.1320, -0.0852]]],
       grad_fn=<ViewBackward0>), past_key_values=((tensor([[[[-1.1883,  2.1788,  0.6307,  ..., -0.3714, -0.3075,  1.0041],
          [-2.4346,  2.1242,  1.5517,  ..., -0.0476, -1.5998,  0.8205],
          [-2.4853,  2.7003,  0.9711,  ...,  0.5021, -1.9622,  3.0207],
          ...,
          [-1.5906,  1.4587,  2.4143,  ..., -0.8270, -1.8376,  1.3219],
          [-2.3332,  1.7387,  0.9545,  ..., -0.8451, -2.0172,  0.2157],
          [-2.4316,  1.9150,  2.4052,  ..., -1.5292, -1.6696,  2.0963]],

###1.2.3. Features in Tensforflow

It is interesting to dive into the features of text in Tensforflow also to see how powerful GPT models are.

In [5]:
from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = TFGPT2Model.from_pretrained('gpt2')
text = "Summarize the history of the Roman Empire:"
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
print(output)

Downloading (…)"tf_model.h5";:   0%|          | 0.00/498M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2Model.

All the layers of TFGPT2Model were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2Model for predictions without further training.


TFBaseModelOutputWithPastAndCrossAttentions(last_hidden_state=<tf.Tensor: shape=(1, 10, 768), dtype=float32, numpy=
array([[[-0.16430704, -0.00884601, -0.38042933, ..., -0.10083465,
         -0.18255429, -0.15494455],
        [-0.5566617 ,  0.3165324 , -0.54203415, ..., -0.03163034,
          0.11117145, -0.22854325],
        [-0.18474899, -0.18006328, -0.70992076, ..., -0.11334559,
          0.42955333,  0.13506673],
        ...,
        [-0.5365959 , -0.18432102, -0.05677529, ..., -0.21624039,
         -0.37445495, -0.06711666],
        [-0.48273715, -0.30475324, -0.51566315, ...,  0.11814982,
         -0.36633205, -0.31362683],
        [-0.23026578, -0.06851458, -0.3971817 , ...,  0.13807619,
          0.13197333, -0.08524485]]], dtype=float32)>, past_key_values=(<tf.Tensor: shape=(2, 1, 12, 10, 64), dtype=float32, numpy=
array([[[[[-1.18831587e+00,  2.17882562e+00,  6.30665720e-01, ...,
           -3.71371508e-01, -3.07469904e-01,  1.00413001e+00],
          [-2.43463755e+00,  2.12

#2.ChatGPT-3, the Origins

a)OpenAI explains the origins of  ChatGPT:
"We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response."

Instruct series expanded:
https://openai.com/blog/chatgpt/

Instruct Series:
https://platform.openai.com/docs/model-index-for-researchers

b) So let's go back to the origins and use davinci-instruct-beta as a conversational bot. Then we will compare the output with ChatGPT.

Note:
Transformers for NLP, 2nd Edition, Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines describes how to use GPT-3.

In [6]:
#Importing openai
try:
  import openai
except:
  !pip install openai
  import openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.26.5.tar.gz (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.5/55.5 KB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: openai
  Building wheel for openai (pyproject.toml) ... [?25l[?25hdone
  Created wheel for openai: filename=openai-0.26.5-py3-none-any.whl size=67620 sha256=5e2468fa5558301adf607e5c462984109d19dd3099511988c2614aa4a7b99714
  Stored in directory: /root/.cache/pip/wheels/a7/47/99/8273a59fbd59c303e8ff175416d5c1c9c03a2e83ebf7525a99
Successfully built openai
Installing collected packages: openai
Successfully installed openai-0.26.5


In [7]:
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')
!cp  "drive/MyDrive/files/api_key.txt" "api_key.txt"
f = open("api_key.txt", "r")
API_KEY=f.readline()
f.close()

#The OpenAI Key
import os
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")


Mounted at /content/drive


In [8]:
#GPT-3 parameters, p2=the prompt as for ChatGPT
p1="davinci-002"
p2="Summarize the history of the Roman Empire:"
p3=0
p4=120
p5=1
p6=0
p7=0

response = openai.Completion.create(engine=p1,prompt=p2,temperature=p3,max_tokens=p4,top_p=p5,frequency_penalty=p6,presence_penalty=p7)
r = (response["choices"][0])
print(r["text"])



The Roman Empire was founded in 27 BC by Augustus Caesar. It was a time of peace and prosperity. The Roman Empire was divided into two parts in 395 AD. The Western Roman Empire fell in 476 AD. The Eastern Roman Empire, also known as the Byzantine Empire, lasted until 1453 AD.


In [9]:
#We continue the dialog
p1="davinci-002"
p2="What are the contributions of the Roman Empire:"
p3=0
p4=120
p5=1
p6=0
p7=0

response = openai.Completion.create(engine=p1,prompt=p2,temperature=p3,max_tokens=p4,top_p=p5,frequency_penalty=p6,presence_penalty=p7)
r = (response["choices"][0])
print(r["text"])



The Roman Empire contributed to the spread of Christianity, the establishment of a common language, and the development of a system of law.


**ChatGPT versus davinci-002**

Below is the output of [OpenAI ChatGPT](https://chat.openai.com/auth/login) for the prompt in this notebook.

**ChatGPT:**
**Prompt: "Summarize the history of the Roman Empire:"**
**Reponse:***
The Roman Empire was founded in 27 BCE by Augustus, following the Roman Republic. It reached its height in the 2nd century CE under the rule of Trajan, and then began a gradual decline due to internal and external pressures. The empire split into two halves in 395 CE, and the western half fell in 476 CE to various invading groups, while the eastern half continued as the Byzantine Empire until its fall to the Ottoman Empire in 1453 CE. The Roman Empire made lasting contributions to Western civilization in areas such as law, language, engineering, and architecture.



---



Now compare with **davinci-002:
***Prompt: "Summarize the history of the Roman Empire:"**

**Response:**The Roman Empire was founded in 27 BC by Augustus Caesar. It was a time of peace and prosperity. The empire was divided into two parts, the Western Roman Empire and the Eastern Roman Empire. The Western Roman Empire fell in 476 AD. The Eastern Roman Empire, which was also known as the Byzantine Empire, lasted until 1453 AD.

**Follow-up prompt:What are the contributions of the Roman Empire:**

**Response:**The Roman Empire contributed to the spread of Christianity, the establishment of a common language, and the development of a system of law.


You can see that ChatGPT is indeed a sibling of the InstructGPT series.

#Next Steps

1. Try different inputs with Hugging Face GPT-2, davinci-instruct-beta, and ChatGPT online.

2. You can fine-tune a davinci model to tailor it to your conversational chatbot needs by following the instructions on [OpenAI](https://platform.openai.com/docs/guides/fine-tuning)



