# Fine-Tuning GPT-2 for Creative Story Generation


## Aim
To fine-tune a pre-trained GPT-2 model for creative story generation.

## Objective
To understand how a large language model can be adapted to a specific task using fine-tuning on custom data.



## Introduction 

**GPT-2 (Generative Pre-trained Transformer-2)** is a transformer-based language model developed by OpenAI.

- It is pre-trained on large text data
- Fine-tuning helps adapt it to a specific task
- Here, GPT-2 is fine-tuned for **creative story generation**

## Why Fine-Tuning is Needed?

- Pre-trained models give general responses
- Fine-tuning improves task-specific creativity
- Helps generate domain-specific stories
- Improves style and consistency

## Step 1: Install and Import Required Libraries

In [1]:
!pip install transformers datasets torch --quiet


[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Step 2: Import Libraries

In [2]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments
from datasets import Dataset
import torch

  from .autonotebook import tqdm as notebook_tqdm


## Step 3: Create Story Dataset

In [3]:
# Sample story dataset (exam-friendly)
stories = [
    "Once upon a time, a young robot dreamed of becoming human.",
    "In a small village, there lived a boy who could talk to animals.",
    "A mysterious door appeared in the forest every full moon.",
    "The future city was powered entirely by artificial intelligence."
]

dataset = Dataset.from_dict({"text": stories})
dataset

Dataset({
    features: ['text'],
    num_rows: 4
})

## Step 4: Load Pre-trained GPT-2 Model and Tokenizer

In [4]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# GPT-2 does not have pad token by default
tokenizer.pad_token = tokenizer.eos_token

Loading weights: 100%|██████████| 148/148 [00:00<00:00, 321.59it/s, Materializing param=transformer.wte.weight]             
[1mGPT2LMHeadModel LOAD REPORT[0m from: gpt2
Key                  | Status     |  | 
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


## Step 5: Tokenize the Dataset

In [5]:
def tokenize_function(example):
    return tokenizer(example['text'], truncation=True, padding='max_length', max_length=64)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
tokenized_dataset

Map: 100%|██████████| 4/4 [00:00<00:00, 85.25 examples/s]


Dataset({
    features: ['text', 'input_ids', 'attention_mask'],
    num_rows: 4
})

## Step 6: Define Training Arguments

In [6]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./gpt2-story-model",
    num_train_epochs=1,          # low for exam demo
    per_device_train_batch_size=2,
    logging_steps=5,
    learning_rate=5e-5,
    report_to="none"             # avoids warnings
)


## Step 7: Fine-Tune the Model

In [7]:
from transformers import DataCollatorForLanguageModeling


In [8]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False   # GPT-2 is NOT masked LM
)


In [9]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator
)

trainer.train()


  super().__init__(loader)
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Step,Training Loss


Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  1.22it/s]


TrainOutput(global_step=2, training_loss=4.046947479248047, metrics={'train_runtime': 8.4123, 'train_samples_per_second': 0.475, 'train_steps_per_second': 0.238, 'total_flos': 130646016000.0, 'train_loss': 4.046947479248047, 'epoch': 1.0})

## Step 8: Generate Creative Story

In [10]:
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors='pt')

output = model.generate(
    **inputs,
    max_length=50,
    do_sample=True,
    top_k=50,
    top_p=0.95
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time, there of, or upon an end of an unending sequence of, we are not the first and we are not the last.


The fourth of the last is in order of or on our last, which is


## Observations (Exam Ready Points)

- Model generates creative stories
- Fine-tuned model follows training style
- Output is more domain-specific than base GPT-2

## Applications

- Story and script writing
- Game narrative generation
- Content creation
- Creative writing assistants

## Advantages and Limitations

**Advantages:**
- Improves creativity
- Task-specific output

**Limitations:**
- Requires training data
- Computationally expensive

## Conclusion 

Fine-tuning GPT-2 adapts a pre-trained language model to generate creative stories by learning patterns from custom story data.