# Overview of GPT-1 on Text_Generation

## Decoder only Transformer
`Input:` A prompt(often referred to as a context) fed into the transformer  as a whole.

`Output:` It depends on the goal of the model for GPT models, the output is the probability distribution of the next token/word that comes after the prompt. It outputs the one prediction for the complete input.


![bWnx0](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/6b9f187e-73ea-4c69-81ff-723ba211c934)

### 1. Self-attention Mechanism
It allows model to focus  on the most relevant parts of the input. A single self attention mechanism is called head.

![wH0ra](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/cbc885bc-9ba0-4a0d-ac98-ddf940ebb3a9)


### 2. Training
#### 2a. Basic Training
The basic training process consist of self supervised learning. Simply put, you gather lots of text, strip the last word from that text, feed it as input into the transformer, check if the prediction matches the word you cut off and backpropagate the error.

For example, data = "This is a sample"  
sample = [
  ["This"],
  ["This", "is"],
  ["This", "is", "a"]
]  
targets = ["is", "a", "sample"]

#### 2b. Fine-tuning
After the first stage of training is completed, the model is now a large language model. However, through fine tuning the model can be adapted to better suit the needs of the final application. One of the key reasons why ChatGPT and GPT4 seem so ridiculously impressive is because of this second stage of training.

#### 2c. Fine-tuning
Doing inference with a transformer is just like training. You insert a prompt and out comes the next word. For GPT models, this means that the prompt is extended one word at a time. You insert the prompt, and outcomes the first word of the answer. The first word of answer  is now added to the prompt, creating a new slightly different prompt. This prompt is again forwarded through the model, giving the prediction of new word.


GPT-1, or Generative Pre-trained Transformer 1, is a state-of-the-art natural language processing model developed by OpenAI. It's part of the Transformer architecture, a type of neural network known for its ability to process sequential data, such as text, very effectively. GPT-2 specifically is designed for text generation tasks, where it predicts the likelihood of a sequence of words given some input context.
:

## GPT-1
The architecture of GPT-1 is given below:

<img width="224" alt="GPT-architecture" src="https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/433647de-74f8-48ee-9f3c-7eb54cd385fc">


**Key Ideas**
- Stack a bunch of Transformer decoders.
- Self supervised pre-training helps a lot.
- Fine-tune on multiple tasks

<img width="993" alt="training" src="https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/4c8227ea-ad0d-4107-aa9c-52e63bbc1b31">

The GPT-1 model is first pre-trained on texts from 7000 books and then further fine tuning for the following tasks:
- Textual Entailment
- Question Answering
- Semantic Similarity
- Classification

### Training
#### 1. Unsupervised Pre-training
Given an unsupervised corpus of tokens $u = \{u_1, u_2, \ldots, u_n\}$, we use standard language modeling objective to maximize the following likelihood:

![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/b74617c3-53b5-48e4-99a9-81d95ef0bfbf)

where k is the size of the context window, and the conditional probability P is modeled using a neural
network with parameters Θ. These parameters are trained using stochastic gradient descent.

#### 2. Supervised fine-tuning
After training the model with the objective in above equation, we adapt the parameters to the supervised target task.

This gives us the following objective:
![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/28fcb7cd-729d-4ebc-921e-96d1e94ddb2d)

where C is labeled dataset, y is label and $\{x^1, x^2, \ldots, x^m\}$ is the sequence of input tokens.

Final, auxillary objective is 
![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/0302215b-d86d-4bfc-a98b-2f332051aac9)

- This auxillary function function improves generalization of supervised model
- Accelerating convergence.


## Tasks
### 1. Textual Entailment

| Premise       | Hypothesis   | Label         |
|---------------|--------------|---------------|
| Adam sleeps   | Adam snores  | Entailment    |
| Adam sleeps   | Adam codes   | Contradiction |
| Adam sleeps   | Michael Sings| Neutral       |

![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/893462ad-31ea-4b4c-9f97-d2cd85b5ad31)


### 2. Question Answering
| Text       | Question   | Score(similarity)         |
|---------------|--------------|---------------|
| Article   | About article  | value    |

![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/3dfcdec9-c6f3-4bff-838a-07d1298e0e46)


### 3. Semantic Similarity
| Text 1       | Text 2   | Label       |
|---------------|--------------|---------------|
| What can do after MBBS?   | What do I do after MBBS  | Duplicate    |

![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/223e20ab-d9ac-42fd-b261-42bdd5a8b1ca)


### 4. Classification
![image](https://github.com/surajkarki66/MediLeaf_backend/assets/50628520/cb69e38b-d459-4b64-a173-29fc1c7b96e2)

**Some Notes Regarding GPT-1**  
- Uses learned position embedding.
- Activation function => Gaussian Error Linear Unit
- L2 regularization proposed with w=0.01
- Learning rate for fine tuning = 6.25e-5
- Batch size for fine-tuning =32
- Epochs

Let's play little bit with GPT-2

In [1]:
!pip install transformers



In [23]:
from transformers import OpenAIGPTTokenizer, OpenAIGPTLMHeadModel

## 1. Convert the sentences into the tokens

In [11]:
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
print(tokenizer)

ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.


OpenAIGPTTokenizer(name_or_path='openai-gpt', vocab_size=40478, model_max_length=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'unk_token': '<unk>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}


In [24]:
model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt', pad_token_id=tokenizer.eos_token_id)

generation_config.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

In [18]:
sentence = 'ChatGPT should be open sourced.'
numeric_ids = tokenizer.encode(sentence, return_tensors = 'pt')

In [19]:
numeric_ids

tensor([[ 879,    3,   30, 1992,  994,  580, 1189, 3586,  940,  239]])

In [20]:
tokenizer.decode(numeric_ids[0])

'chatgpt should be open sourced.'

## 2. Generate the text given the sentence

In [25]:
result = model.generate(numeric_ids, max_length = 100, num_beams=5, no_repeat_ngram_size=2, early_stopping=True)

In [26]:
result

tensor([[  879,     3,    30,  1992,   994,   580,  1189,  3586,   940,   239,
           244, 40477,   244,   249,   587,   538,   825,   620,   240,   244,
           520,   603,   239, 40477,   487,   816,   491,   513,   562,   246,
           928,  1113,   240,   488,   674,   487,   603,   240,   500,   246,
          1002,   620,   954,   520,   558,   485,  9261,   485,  1344,   575,
           715,   481,  1465,   498,   481,  4406,   240, 40477,   256,   256,
           249,   825,   512,   761,   770,   239,   249,   719,   797,   485,
           604,   485,   587,   846,   670,   525,   239,  6725, 40477,   520,
           816,   609,   491,   575,   240,   513,   741,  2144,   239,   487,
           509,  2538,   239,   507,   509,   481,  1242,   498,   246,   762]])

In [27]:
generated_text = tokenizer.decode(result[0], skip_special_tokens=True)
print(generated_text)

chatgpt should be open sourced. " 
 " i don't think so, " she said. 
 he looked at her for a long moment, and then he said, in a voice so low she had to strain to hear him over the sound of the engine, 
'' i think you're right. i'm going to have to do something about that. '' 
 she looked up at him, her eyes wide. he was smiling. it was the smile of a man


In [32]:
sentence2 = 'Suraj loves sandhya and'
numeric_ids2 = tokenizer.encode(sentence2, return_tensors = 'pt')

In [33]:
result2 = model.generate(numeric_ids2, max_length = 100, num_beams=5, no_repeat_ngram_size=2, early_stopping=True)

In [34]:
generated_text = tokenizer.decode(result2[0], skip_special_tokens=True)
print(generated_text)

suraj loves sandhya and wants to be a part of it.'
'i don't know what to say to that,'suraj said. 
 saira looked at him with a serious face. she didn't want to talk about her father's death. it was not something she wanted to discuss with suraj. he was the only person she could talk to about it, and she wasn't sure if she should tell him the whole story of her life. the last thing she needed was to
