# Fantasy Story Generation using Transformer Models: A Project Report

This report outlines the process of developing a text generation model capable of producing creative short stories in the fantasy genre by fine-tuning a pre-trained GPT-2 transformer model.

--- 
## 1. Project Scope and Goals

The first step is to define the project's objectives, target audience, and constraints.

In [None]:
project_goal = """
The primary goal of this project is to develop a text generation model capable of producing high-quality, creative short stories in the fantasy genre.
"""

text_type = "Creative short stories in the fantasy genre."

target_audience = """
Writers and creative individuals seeking inspiration or assistance in generating story ideas and narratives.
"""

constraints_requirements = """
-   **Computational Resources:** Training will be performed on cloud-based GPUs (e.g., AWS Sagemaker, Google Colab Pro). A limited budget requires efficient model architecture selection and training strategies.
-   **Time Limitations:** Project completion within 3 months. This includes data collection, model selection, training, evaluation, and documentation.
-   **Ethical Considerations:** The generated text should not contain harmful, biased, or offensive content. Mechanisms for filtering or mitigating such outputs will be necessary.
-   **Data Availability:** Sourcing a diverse and high-quality dataset of fantasy short stories might be challenging due to copyright and accessibility. Focus on publicly available datasets or creative commons licensed works.
-   **Model Complexity:** The chosen model should be complex enough to capture the nuances of creative writing but not so complex that it becomes computationally infeasible to train within the given time and resource constraints.
"""

print("Project Goal:")
print(project_goal)
print("\nType of Text to be Generated:")
print(text_type)
print("\nTarget Audience:")
print(target_audience)
print("\nConstraints and Requirements:")
print(constraints_requirements)

---

## 2. Data Collection and Preprocessing

A suitable dataset is essential for training the model. The process involves gathering, cleaning, and preparing the data for the model.

### 2.1. Data Collection (Simulated)

Due to copyright and accessibility challenges, sourcing a large dataset of fantasy stories can be difficult. For this project, we simulate a small sample dataset to demonstrate the workflow.

In [None]:
import pandas as pd

# Simulate a sample dataset of fantasy short stories
# In a real project, this would involve web scraping, using APIs, or downloading datasets
sample_data = {
    'story_id': [1, 2, 3],
    'story_text': [
        "In the ancient forest, where shadows danced and whispers echoed, lived Elara, an enchantress of unparalleled power. Her eyes, the color of the deepest forest pool, held secrets older than the mountains. One day, a quest arrived...",
        "A lone knight, clad in silver armor, stood before the towering gates of the Dragon's Lair. Sir Kaelen had faced countless foes, but none as fearsome as the beast that guarded the lost artifact. The air crackled with magic...",
        "Deep beneath the earth, the dwarves of Ironpeak delved for mithril. Their hammers rang against stone, a rhythm as old as time. Among them was Borin, a young dwarf with dreams of adventure beyond the mines. A hidden passage awaited..."
    ]
}
df = pd.DataFrame(sample_data)

print("Sample Data:")
display(df)

### 2.2. Data Cleaning

The raw text data is cleaned to remove special characters and convert it to a consistent format.

In [None]:
import re

def clean_text(text):
    # Remove special characters and punctuation (keeping spaces)
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    # Convert to lowercase
    text = text.lower()
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df['cleaned_story_text'] = df['story_text'].apply(clean_text)

print("Cleaned Data:")
display(df[['story_id', 'cleaned_story_text']])

### 2.3. Data Splitting

The dataset is split into training and testing sets. **Note:** With only 3 samples, this split (2 for training, 1 for testing) is insufficient for meaningful training but is included here to demonstrate the standard procedure.

In [None]:
from sklearn.model_selection import train_test_split

# Splitting data into training and test sets
train_df, test_df = train_test_split(df, test_size=0.33, random_state=42)

print("Train Set:")
display(train_df)

print("\nTest Set:")
display(test_df)

---

## 3. Model Selection and Setup

We chose the GPT-2 model for this project due to its proven effectiveness in text generation and the extensive support provided by the Hugging Face ecosystem.

### 3.1. Load Pre-trained Model and Tokenizer

We load the pre-trained GPT-2 model and its corresponding tokenizer. The tokenizer's padding token is set to its end-of-sequence (EOS) token, which is standard practice for causal language models like GPT-2.

In [None]:
!pip install transformers[torch] datasets scikit-learn -q

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Choose and load the pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Set the pad_token_id to the eos_token_id for GPT-2
tokenizer.pad_token_id = tokenizer.eos_token_id
model.config.pad_token_id = model.config.eos_token_id

### 3.2. Prepare Datasets for Training

The cleaned text is tokenized and formatted into a Hugging Face `Dataset` object. A `labels` column, identical to `input_ids`, is added, which is necessary for the model to compute the language modeling loss during training.

In [None]:
from datasets import Dataset

# Tokenize the cleaned story text
def tokenize_function(examples):
    # Use truncation and padding to handle varying sequence lengths
    return tokenizer(examples["cleaned_story_text"], truncation=True, padding="max_length", max_length=256)

# Convert the pandas DataFrames to Hugging Face Datasets
train_dataset = Dataset.from_pandas(train_df[['cleaned_story_text']])
test_dataset = Dataset.from_pandas(test_df[['cleaned_story_text']])


# Apply the tokenization function to the datasets
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=["cleaned_story_text"])
tokenized_test_dataset = test_dataset.map(tokenize_function, batched=True, remove_columns=["cleaned_story_text"])

# Add the 'labels' column for loss calculation
tokenized_train_dataset = tokenized_train_dataset.map(lambda examples: {'labels': examples['input_ids']}, batched=True)
tokenized_test_dataset = tokenized_test_dataset.map(lambda examples: {'labels': examples['input_ids']}, batched=True)


print("Training data prepared for GPT-2.")
print(tokenized_train_dataset)

---

## 4. Model Training

The model is fine-tuned on our prepared fantasy story dataset. We use the `Trainer` class from the `transformers` library to handle the training loop.

In [None]:
from transformers import Trainer, TrainingArguments

# Configure training arguments
training_args = TrainingArguments(
    output_dir="./gpt2-fantasy-stories",      # Output directory for checkpoints and logs
    overwrite_output_dir=True,
    num_train_epochs=3,                      # Number of training epochs
    per_device_train_batch_size=1,           # Batch size (set to 1 due to small sample size)
    save_steps=10_000,                       # Save checkpoint every 10,000 steps
    save_total_limit=2,                      # Limit the total number of checkpoints
    logging_steps=1,                         # Log every step
    report_to="none"                         # Disable external reporting
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
)

# Start training
print("Starting model training...")
trainer.train()
print("Training finished.")

---

## 5. Model Evaluation

After training, we evaluate the model's performance by generating text samples and manually assessing their quality.

### 5.1. Generate Text Samples

We create a function to generate stories from various prompts to see how the model performs.

In [None]:
def generate_story(model, tokenizer, prompt, max_length=150, num_return_sequences=1):
    """Generates text using the trained model."""
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    input_ids = input_ids.to(model.device)

    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        no_repeat_ngram_size=2, # Helps prevent repetitive phrases
        do_sample=True,         # Enable sampling for more creative output
        top_k=50,
        top_p=0.95,
        temperature=0.8,        # Control randomness
    )

    generated_stories = []
    for i, output in enumerate(output_sequences):
        text = tokenizer.decode(output, skip_special_tokens=True)
        generated_stories.append(f"--- Generated Story {i+1} ---\n{text}\n")

    return generated_stories

# Generate text samples with different prompts
prompts = [
    "In a hidden valley, a young wizard discovered a forgotten artifact",
    "The ancient dragon awakened from its centuries-long slumber",
    "Deep within the enchanted forest, a brave warrior sought the legendary sword"
]

print("Generating text samples...\n")
for prompt in prompts:
    print(f"Prompt: '{prompt}'")
    generated_texts = generate_story(model, tokenizer, prompt)
    for text in generated_texts:
        print(text)

### 5.2. Manual Assessment and Limitations

The generated samples are manually reviewed for quality, coherence, and relevance.

**Summary of Manual Assessment:**

The generated text demonstrates a basic understanding of the prompts but is far from the goal of high-quality, creative storytelling.
- **Quality:** The grammar is generally correct, but the sentences are simple.
- **Coherence:** The stories often lack a clear narrative structure and can become nonsensical.
- **Relevance:** The outputs stay on topic but fail to develop the initial idea in a meaningful way.

**Limitations and Conclusion:**

The poor performance is expected and is a direct result of the **extremely small training dataset**. A model cannot learn the complex nuances of creative writing from only two sample stories. To achieve the project's goal, a significantly larger and more diverse dataset is required.

**Considerations for Future Quantitative Evaluation:**

With a larger dataset, a more rigorous evaluation would include:
1.  **Perplexity:** To measure the model's confidence on a test set.
2.  **Human Evaluation:** The gold standard for creative tasks, where evaluators score stories on creativity, coherence, and fluency.
3.  **Automated Metrics:** Using metrics like BLEU or ROUGE cautiously, as they are less suited for open-ended generation.
4.  **Diversity Metrics:** To ensure the model generates varied and non-repetitive text.

---

## 6. Model Deployment (Conceptual)

Deploying the model would make it accessible for real-world applications. Below are potential options and considerations.

**Potential Deployment Options:**
* **Cloud Platforms:**
    * **AWS Sagemaker, Google Cloud AI Platform, Azure ML:** Managed services that simplify deploying, scaling, and managing ML models.
    * **Hugging Face Inference API/Endpoints:** A convenient and specialized service for hosting models from the Hugging Face Hub.
* **Creating a Custom REST API:**
    * Using frameworks like **Flask** or **FastAPI** to build a custom web service that provides full control over the model and infrastructure.

**Factors Influencing Deployment Choice:**
* **Traffic & Latency:** High-traffic, low-latency applications benefit from scalable cloud platforms.
* **Budget:** Cloud platforms have a pay-as-you-go model, while self-hosting may be cheaper for consistent loads.
* **Scalability & Expertise:** Cloud services offer easy scaling but may abstract away control. Custom APIs require more technical expertise to build and maintain.

**General Deployment Steps:**
1.  **Save** the final trained model and tokenizer.
2.  **Package** the model, code, and dependencies (e.g., in a Docker container).
3.  **Deploy** the package to the chosen platform (e.g., cloud endpoint, server).
4.  **Test** the deployed endpoint to ensure it functions correctly.
5.  **Integrate** the endpoint with the final application (e.g., a writer's tool).
6.  **Monitor and Maintain** the model's performance and resource usage.

---

## 7. Summary and Future Work

### Summary
This project successfully established a complete pipeline for fine-tuning a GPT-2 model for fantasy story generation. Key phases, including data preprocessing, model setup, training, and evaluation, were executed. However, the primary limitation was the severe lack of training data, which prevented the model from learning to generate high-quality, creative narratives. The manual evaluation highlighted this, showing outputs that were simplistic and lacked narrative depth.

### Future Work
The most critical next step is to **acquire a significantly larger and more diverse dataset** of fantasy stories. Once a substantial dataset is available, the following steps can be taken to improve the model:
* **Implement a rigorous evaluation framework**, including both automated metrics and human evaluation, to accurately benchmark performance.
* **Experiment with different model architectures** or larger models (e.g., GPT-2-medium, GPT-Neo) if resources permit.
* **Explore advanced fine-tuning techniques** like LoRA (Low-Rank Adaptation) to potentially improve training efficiency and performance.
* **Develop a user-friendly interface or API** to make the model accessible to the target audience of writers and creatives.
* **Implement content filtering mechanisms** to ensure the generated text aligns with ethical guidelines and avoids harmful outputs.