In [None]:
# ============================
# 🚀 GPT-2 Fine-tuning in Google Colab
# ============================

# 1️⃣ Install dependencies
!pip install transformers datasets torch --quiet

# 2️⃣ Import libraries
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments
from datasets import load_dataset
import torch

# 3️⃣ Load GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
model = GPT2LMHeadModel.from_pretrained("gpt2")

# 4️⃣ Create a sample dataset (You can replace this with your own file later)
sample_text = """
Once upon a time, there was a small village near the mountains.
The villagers were kind and lived in harmony with nature.
One day, a traveler arrived and told stories about the outside world.
The children loved hearing his tales and asked for more every night.
"""
with open("sample_dataset.txt", "w") as f:
    f.write(sample_text)

dataset = load_dataset("text", data_files={"train": "sample_dataset.txt"})

# 5️⃣ Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# 6️⃣ Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="no",
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    num_train_epochs=3,
    weight_decay=0.01,
    save_steps=10_000,
    save_total_limit=2,
    logging_dir='./logs',
)

# 7️⃣ Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
)

# 8️⃣ Train the model
trainer.train()

# 9️⃣ Save the fine-tuned model
model.save_pretrained("./fine_tuned_gpt2")
tokenizer.save_pretrained("./fine_tuned_gpt2")

# 🔟 Generate text from the fine-tuned model
prompt = "In the future"
inputs = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_length=50, num_return_sequences=1, temperature=0.7)
print("\nGenerated text:\n", tokenizer.decode(outputs[0], skip_special_tokens=True))


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

TypeError: TrainingArguments.__init__() got an unexpected keyword argument 'evaluation_strategy'

In [None]:
prompts = [
    "Once upon a time",
    "Artificial Intelligence will",
    "In the year 2050",
    "The secret to happiness is",
    "My favorite food is"
]

for p in prompts:
    inputs = tokenizer(p, return_tensors="pt").input_ids
    outputs = model.generate(inputs, max_length=50, temperature=0.7, top_p=0.9, do_sample=True)
    print(f"\nPrompt: {p}")
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Once upon a time
Once upon a time, the world of the gods was once a place of utter darkness, of the cold and the wet, of the damp and the wintry. But now it is filled with life and beauty, and of the infinite beauty of


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: Artificial Intelligence will
Artificial Intelligence will have a unique ability to understand the way we use the internet, and how we interact with it. The software will also be able to help with the use of our social media accounts, as well as to create a more efficient system


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: In the year 2050
In the year 2050, the number of people living in poverty will exceed the number of people living in the country at any given moment.

The current poverty rate is 5.4 per cent, down from 6.3 per cent in 2014.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Prompt: The secret to happiness is
The secret to happiness is the determination to have a good life. Happiness is not about being happy, it is about being happy. Happiness is about being happy.

The happiest person in the world is the one who lives in harmony with the universe

Prompt: My favorite food is
My favorite food is always in the freezer. I love to eat it right away. It's my favorite food to eat every day. I love it on my plates. I love to serve it in my car. It's my favorite food to eat




```
# This is formatted as code
```

