# Generating News Headlines using GPT2

## GPT2
GPT-2 model was released as part of the work titled “Language Models are Unsupervised Multi-task Learners”  in 2019. The largest GPT-2 variant is a huge 1.5B parameter transformer-based model which the model was able to perform remarkably well of various NLP tasks. The most striking aspect of this work is that the authors showcase how a model trained in an unsupervised fashion (language modeling) achieves state-of-the-art performance in zero-shot setting.

## HuggingFace Transformers
One of the most propular python packages to work with Transformer based NLP models. Huggingface transformers is a high-level API to easily load, fine-tune and re-train models such as GPT2, BERT, T5 and so on

## Fake Headlines
ABC-News Dataset is a dataset of a million headlines available [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SYBGZL) collected over a period of 17 years. We will make use of this dataset to fine-tune the GPT2 model. Once fine-tuned we will use it to generate some fake headlines

## Install Transformers

In [6]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m38.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m64.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m62.1 MB/s[0m eta [36m0:00:00[0m
Colle

## Prepare Tokenizer

In [7]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

def generate_news(prompt, max_length=250, model_name="gpt2-medium"):
    # Load pre-trained model and tokenizer
    model = GPT2LMHeadModel.from_pretrained(model_name)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)

    # Encode the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt")

    # Generate text
    with torch.no_grad():
        output = model.generate(
            input_ids,
            max_length=max_length,
            num_return_sequences=1,
            no_repeat_ngram_size=2,
            top_k=50,
            top_p=0.95,
            temperature=0.7,
        )

    # Decode the generated text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

prompt = "Breaking news: Scientists have discovered"
generated_article = generate_news(prompt)
print(generated_article)



Downloading (…)lve/main/config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Breaking news: Scientists have discovered a new species of spider that lives in the arid desert of southern Mexico.

The new spider, named "Panthera leucophila," is the first spider to be found in Mexico since the 1970s. The new discovery was made by researchers from the University of Texas at Austin and the Mexican National Institute of Anthropology and History. It was published in a paper published online in PLOS ONE. [See Photos of the New Spider]
,
...



In [None]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

class GPT2Chatbot:
    def __init__(self, model_name="gpt2-medium"):
        self.tokenizer = GPT2Tokenizer.from_pretrained(model_name)
        self.model = GPT2LMHeadModel.from_pretrained(model_name)

    def generate_response(self, prompt, max_length=150, num_return_sequences=1):
        # Encode the prompt
        input_ids = self.tokenizer.encode(prompt, return_tensors="pt")

        # Generate a response
        with torch.no_grad():
            output = self.model.generate(
                input_ids,
                max_length=max_length,
                num_return_sequences=num_return_sequences,
                no_repeat_ngram_size=2,
                top_k=50,
                top_p=0.95,
                temperature=0.7,
            )

        # Decode the generated text
        response = self.tokenizer.decode(output[0], skip_special_tokens=True)
        return response

    def chat(self):
        print("GPT-2 Chatbot: Hello! Type 'exit' to end the chat.")
        while True:
            user_input = input("You: ")
            if user_input.lower() in ["quit", "exit", "bye"]:
                print("GPT-2 Chatbot: Goodbye!")
                break
            else:
                response = self.generate_response(user_input)
                print("GPT-2 Chatbot:", response)

if __name__ == "__main__":
    chatbot = GPT2Chatbot()
    chatbot.chat()


GPT-2 Chatbot: Hello! Type 'exit' to end the chat.
You: How are you


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


GPT-2 Chatbot: How are you doing?"

"I'm fine."
, "I don't know." "You're fine?" "No." (If you're not sure, ask your partner.)
 "What's wrong?" (You can ask for help.) "Is it okay if I go to the bathroom?" If you don, you can say,
 "No, I'm not going to go." 
If your answer is "yes," you are done.
You are not done if you say "no." If your response is,  "Yes, but I don`t want to do it." You are finished if your reply is ,   "But I want you to."
You: What are you doing 


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


GPT-2 Chatbot: What are you doing?"

"I'm going to go to the bathroom."
, and the following lines are all from the same scene.
The scene where the two girls are talking about the "good" and "bad" of the game is from a different episode. The scene in which the girls talk about their feelings for each other is in the episode "The Girl Who Loved Me".
In the scene when the girl is talking to her mother, she says, "I love you, Mom." The line is "You're my mom." In the next scene, the mother is saying,
"...I don't know what to say." "What do you mean?" "It's not like I'm saying anything
