<a href="https://colab.research.google.com/github/smomtahe/-Large-Language-Model-LLM-/blob/main/LLM_GPT_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# This code utilizes the Hugging Face transformers library to load the pre-trained GPT-2 model and tokenizer. It then defines a hot topic related to climate change and generates text based on that topic using the GPT-2 model.

In [2]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = TFGPT2LMHeadModel.from_pretrained("gpt2-medium")

# Define your hot topic
hot_topic = "Climate change is a pressing issue that requires urgent action."

# Tokenize the hot topic
input_ids = tokenizer.encode(hot_topic, return_tensors="tf")




All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [3]:
# Generate text based on the hot topic
output = model.generate(input_ids, max_length=100, num_return_sequences=1, temperature=0.7, do_sample=False)


# Decode and print the generated text
for i, sample_output in enumerate(output):
    print(f"Generated Text {i+1}: {tokenizer.decode(sample_output, skip_special_tokens=True)}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text 1: Climate change is a pressing issue that requires urgent action. The United States must lead the world in addressing this challenge.

The United States has a long history of working with other countries to address climate change. We have worked with other countries to reduce greenhouse gas emissions, and we have worked with other countries to reduce emissions from coal-fired power plants. We have also worked with other countries to reduce emissions from oil and gas.

We have also worked with other countries to reduce emissions from coal


In [5]:
#For an online dataset, we can explore various datasets available through TensorFlow Datasets (TFDS); for example:


In [4]:
import tensorflow_datasets as tfds

# Load a dataset from TensorFlow Datasets
dataset = tfds.load("imdb_reviews", split="train")

# Iterate over the dataset and print a sample
for example in dataset.take(1):
    print(example["text"])


Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incomplete8GR15Q/imdb_reviews-train.tfrecord…

Generating test examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incomplete8GR15Q/imdb_reviews-test.tfrecord*…

Generating unsupervised examples...:   0%|          | 0/50000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incomplete8GR15Q/imdb_reviews-unsupervised.t…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.
tf.Tensor(b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.", shape=(), dtype=string)


In [6]:
# Text completion
partial_text = "The cat sat on the"
input_ids = tokenizer.encode(partial_text, return_tensors="tf")
output = model.generate(input_ids, max_length=50, num_return_sequences=1, temperature=0.7, do_sample=False)
completed_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Completed Text:", completed_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completed Text: The cat sat on the floor, staring at the ceiling.

"I'm sorry, I'm sorry," she said. "I'm sorry."

"I'm sorry," he repeated.

"I'm sorry," she repeated


In [7]:
# Generate text based on a prompt
prompt = "The quick brown fox"
input_ids = tokenizer.encode(prompt, return_tensors="tf")
output = model.generate(input_ids, max_length=50, num_return_sequences=1, temperature=0.7, do_sample=False)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Text:", generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog.

The quick


In [8]:
# Generate conversational responses
dialogue_history = "User: How are you?\nBot:"
input_ids = tokenizer.encode(dialogue_history, return_tensors="tf")
output = model.generate(input_ids, max_length=100, num_return_sequences=1, temperature=0.7, do_sample=True)
generated_response = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Response:", generated_response)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Response: User: How are you?
Bot: I'm fine.
Bot: I'm not doing anything.
Bot: I'm just having a bad day.
Bot: I'm not doing anything.
Bot: I'm just having a bad day.
Bot: I'm not doing anything.
Bot: I'm just having a bad day.
Bot: I'm not doing anything.
Bot: I'm just having a bad day.
Bot: I'm not doing


In [48]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = TFGPT2LMHeadModel.from_pretrained("gpt2")

# Define a prompt for text generation
prompt = "Generate a Python function to calculate the factorial of a number"




All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [49]:
# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="tf")



In [51]:
# Generate text based on the prompt
output = model.generate(input_ids, max_length=100, num_return_sequences=1, temperature=0.7)

# Decode and print the generated text
for i, sample_output in enumerate(output):
    print(f"Generated Text {i+1}: {tokenizer.decode(sample_output, skip_special_tokens=True)}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text 1: Generate a Python function to calculate the factorial of a number.

>>> from math import factorial >>> factorial(1, 2, 3)

>>> factorial(1, 2, 3)

>>> factorial(1, 2, 3)

>>> factorial(1, 2, 3)

>>> factorial(1, 2, 3)

>>> factorial(1, 2, 3)

>>> factorial(1,
