# ‚úçÔ∏è Text Generation: GPT, T5 & LLMs

Modern text generation with transformer-based language models.

## Learning Outcomes
- GPT-style autoregressive generation
- T5 for text-to-text tasks
- Controlled text generation
- Practical applications

**Level**: Advanced | **Time**: 75 min | **GPU**: Recommended

In [None]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, T5ForConditionalGeneration, T5Tokenizer
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

## 1. GPT-2 Text Generation

In [None]:
# Load GPT-2
gpt2_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
gpt2_model = GPT2LMHeadModel.from_pretrained('gpt2').to(device)
gpt2_model.eval()

print(f"GPT-2 Parameters: {sum(p.numel() for p in gpt2_model.parameters()):,}")

In [None]:
def generate_text(model, tokenizer, prompt, max_length=100, temperature=0.7, top_p=0.9):
    """Generate text with GPT-2."""
    inputs = tokenizer.encode(prompt, return_tensors='pt').to(device)
    
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Generate text
prompt = "The future of artificial intelligence is"
generated = generate_text(gpt2_model, gpt2_tokenizer, prompt)
print(f"\nüìù Generated Text:\n{generated}")

## 2. Decoding Strategies

In [None]:
def compare_decoding(prompt):
    """Compare different decoding strategies."""
    inputs = gpt2_tokenizer.encode(prompt, return_tensors='pt').to(device)
    
    strategies = {
        'Greedy': {'do_sample': False},
        'Temperature (0.5)': {'do_sample': True, 'temperature': 0.5},
        'Temperature (1.5)': {'do_sample': True, 'temperature': 1.5},
        'Top-p (0.9)': {'do_sample': True, 'top_p': 0.9},
        'Top-k (50)': {'do_sample': True, 'top_k': 50}
    }
    
    print(f"Prompt: '{prompt}'\n")
    for name, params in strategies.items():
        with torch.no_grad():
            output = gpt2_model.generate(
                inputs, max_length=50, pad_token_id=gpt2_tokenizer.eos_token_id, **params
            )
        text = gpt2_tokenizer.decode(output[0], skip_special_tokens=True)
        print(f"{name}: {text[:100]}...\n")

compare_decoding("Machine learning is")

## 3. T5 for Text-to-Text Tasks

In [None]:
# Load T5
t5_tokenizer = T5Tokenizer.from_pretrained('t5-small')
t5_model = T5ForConditionalGeneration.from_pretrained('t5-small').to(device)
t5_model.eval()

print(f"T5-small Parameters: {sum(p.numel() for p in t5_model.parameters()):,}")

In [None]:
def t5_generate(task_prefix, input_text, max_length=100):
    """Generate with T5 using task prefix."""
    input_ids = t5_tokenizer.encode(f"{task_prefix}: {input_text}", return_tensors='pt').to(device)
    
    with torch.no_grad():
        outputs = t5_model.generate(input_ids, max_length=max_length)
    
    return t5_tokenizer.decode(outputs[0], skip_special_tokens=True)

# Summarization
article = "Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It uses algorithms to find patterns in data and make predictions."
summary = t5_generate("summarize", article)
print(f"üìÑ Summary: {summary}")

# Translation
english = "Hello, how are you today?"
german = t5_generate("translate English to German", english)
print(f"üá©üá™ German: {german}")

## 4. Controlled Generation

In [None]:
def generate_with_keywords(prompt, keywords, max_length=100):
    """Guide generation with keywords."""
    keyword_str = ', '.join(keywords)
    full_prompt = f"{prompt} (keywords: {keyword_str})"
    return generate_text(gpt2_model, gpt2_tokenizer, full_prompt, max_length)

# Generate with topic guidance
result = generate_with_keywords(
    "Write about technology:",
    ['innovation', 'future', 'AI']
)
print(f"üìù Controlled generation:\n{result}")

## 5. Code Generation

In [None]:
# Using code-specific model
try:
    code_generator = pipeline('text-generation', model='Salesforce/codegen-350M-mono', device=0 if torch.cuda.is_available() else -1)
    
    code_prompt = "def calculate_fibonacci(n):"
    generated_code = code_generator(code_prompt, max_length=100)[0]['generated_text']
    print(f"üíª Generated Code:\n{generated_code}")
except:
    print("For code generation, try: codegen, starcoder, or codellama")

## 6. Model Comparison

In [None]:
import pandas as pd

comparison = pd.DataFrame({
    'Model': ['GPT-2', 'GPT-3', 'T5', 'FLAN-T5', 'LLaMA 2', 'Claude'],
    'Parameters': ['117M-1.5B', '175B', '60M-11B', '80M-11B', '7B-70B', '?'],
    'Type': ['Decoder', 'Decoder', 'Enc-Dec', 'Enc-Dec', 'Decoder', 'Decoder'],
    'Open': ['Yes', 'No', 'Yes', 'Yes', 'Yes', 'No'],
    'Best For': ['General', 'Complex', 'Multitask', 'Instructions', 'Open-source', 'Safety']
})

print("üìä LLM Comparison:")
display(comparison)

## 7. Production Deployment

In [None]:
print("üöÄ Deployment Options:")
print("  1. HuggingFace Inference API - Easy, pay-per-use")
print("  2. vLLM - Fast local inference with PagedAttention")
print("  3. TensorRT-LLM - NVIDIA optimized")
print("  4. GGML/llama.cpp - CPU inference")
print("  5. Ollama - Simple local deployment")

print("\nüí∞ Cost Analysis (1M tokens/day):")
print("  OpenAI GPT-4: ~$60/day")
print("  OpenAI GPT-3.5: ~$2/day")
print("  Self-hosted LLaMA 7B: ~$50/month (GPU)")

## üéØ Key Takeaways
1. Temperature controls randomness
2. Top-p (nucleus) for quality
3. T5 task prefixes for flexibility
4. Quantization for deployment

## üìö Further Reading
- Radford et al., "Language Models are Unsupervised Multitask Learners" (GPT-2)
- Raffel et al., "Exploring Limits of Transfer Learning with T5"
- Touvron et al., "LLaMA: Open Foundation Models"