🧠 Interactive Transformer Architecture Tutorials

Learn transformer architecture concepts through hands-on visualizations and step-by-step mathematical analysis.

🚀 Live Demo

📚 Available Tutorials

🏗️ Transformer Basics: The Foundation [NEW]

File: transformer-basics.html

Essential foundation for understanding modern AI - from the revolutionary breakthrough to why transformers work so well:

The problem with RNNs and CNNs - why sequential processing was a bottleneck
The attention breakthrough - "Attention is All You Need" explained simply
Core architecture components - interactive exploration of transformer building blocks
Three paradigms - Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5)
Evolution timeline - from 2017 research to ChatGPT revolution
Interactive comparisons - see why transformers won over previous architectures

Key Concepts: Attention mechanism, parallel processing, architectural paradigms, AI evolution

📊 Architecture Comparison: Modern LLM Designs [NEW]

File: architecture-comparison.html

Comprehensive comparison of modern LLM architectures across the industry:

Real model analysis - GPT-4, Claude, Gemini, LLaMA, Qwen, DeepSeek architectures
Design decisions breakdown - why different companies made different choices
Performance vs efficiency trade-offs - computational costs and capabilities
Architecture evolution - from academic research to production systems
Interactive model explorer - compare specifications side-by-side
Future trends analysis - where LLM architectures are heading

Key Concepts: Model comparison, design trade-offs, production considerations, architectural evolution

🎯 Q, K, V Matrix Dimensions

File: qkv-matrices.html

Interactive exploration of attention mechanism matrix sizes and their relationship to model architecture:

Real model comparisons (GPT-4, Claude Sonnet 4, Gemini, DeepSeek, LLaMA)
Matrix size calculations showing how d and m affect memory/computation
Architecture analysis with concrete memory requirements
Visual demonstrations of parameter scaling

Key Concepts: Attention matrices, model dimensions, memory scaling, architecture comparison

🔍 LoRA: Low-Rank Adaptation Mathematics [NEW SERIES]

File: lora-tutorial.html

Complete mathematical foundation of Low-Rank Adaptation - the breakthrough technique for efficient fine-tuning:

LoRA core equation - W = W₀ + BA with step-by-step derivation
Interactive parameter calculator - real-time memory savings and parameter reduction
Matrix decomposition visualizer - see how low-rank approximation works
Layer targeting strategies - which layers to adapt (Q,K,V vs FFN analysis)
Rank selection guidance - optimal rank for different model sizes and tasks
Real model examples - LLaMA, Mistral, Mixtral with actual specifications
Production deployment - multi-tenant serving and adapter swapping

Key Concepts: Low-rank decomposition, parameter efficiency, rank selection, adapter strategies

🎛️ Full Fine-tuning vs LoRA: Complete Comparison [NEW SERIES]

File: finetuning-comparison.html

Master the complete spectrum of fine-tuning approaches - from full parameter updates to efficient adaptation:

Mathematical comparison - full update equations vs selective LoRA updates
Interactive layer freezing - strategic freezing for balanced efficiency/performance
Catastrophic forgetting analysis - risk assessment and mitigation strategies
Memory & cost calculator - real hardware requirements and cloud costs
Decision framework - smart advisor for choosing optimal approach
Training speed analysis - time and efficiency comparisons
Production strategies - multi-task serving and deployment patterns

Key Concepts: Full fine-tuning mathematics, layer freezing, catastrophic forgetting, resource optimization

🚀 Advanced PEFT: QLoRA, DoRA & Modern Techniques [NEW SERIES]

File: advanced-peft.html

Cutting-edge Parameter-Efficient Fine-Tuning techniques for maximum efficiency:

QLoRA deep dive - 4-bit quantization + LoRA mathematics
DoRA analysis - Weight-Decomposed Low-Rank Adaptation
AdaLoRA - adaptive rank allocation during training
Modern optimizations - LoRA+, Delta-LoRA, and latest research
Quantization strategies - INT8, INT4, and mixed-precision approaches
Production deployment - serving quantized models efficiently
Performance benchmarks - comprehensive comparison across techniques

Key Concepts: Quantization mathematics, advanced PEFT, deployment optimization, cutting-edge research

🌀 RoPE: Rotary Position Embedding

File: rope-tutorial.html

Comprehensive guide to understanding how transformers encode position information through rotation:

Visual dimension pairing with color-coded examples
Complete mathematical walkthrough with cos/sin transformations
Interactive examples with up to 128D embeddings and 128 token contexts
Context extension challenges and scaling analysis
Step-by-step RoPE application with real token examples

Key Concepts: Position encoding, dimension pairs, rotation mathematics, context scaling

⚡ Complete Attention Mechanism

File: complete-attention-mechanism.html

Interactive step-by-step walkthrough of how Q, K, V matrices work together in transformer attention:

Matrix creation process with real token examples
Q × K^T computation showing compatibility scores
Softmax normalization converting scores to probabilities
Attention × V application demonstrating information flow
Interactive matrix explorer showing individual component impacts

Key Concepts: Q×K^T computation, softmax normalization, attention×V, matrix interactions

🔄 Attention Mechanisms Evolution: MHA → GQA → MLA

File: attention-evolution.html

Complete evolution of attention mechanisms from Multi-Head Attention through Grouped Query Attention to Multi-Head Latent Attention:

KV caching foundation - universal optimization across all attention mechanisms
Memory scaling analysis with exact calculations for different architectures
Evolution timeline from MHA (2017) → MQA (2019) → GQA (2023) → MLA (2024)
Interactive comparisons showing memory savings and trade-offs
Deep dive into MLA with compression/decompression mathematics
Real model configurations (GPT, LLaMA, Qwen, DeepSeek) with memory analysis

Key Concepts: KV caching, memory optimization, grouped attention, compression techniques, evolution timeline

🚀 Text Generation Process

File: text-generation-process.html

Complete mathematical walkthrough from attention output to next token prediction:

Feed-forward network computation with exact matrix dimensions
Layer normalization & residual connections mathematical analysis
Output projection to vocabulary showing the largest matrix operation
Sampling strategies (temperature, top-k, top-p) with live probability visualization
Performance analysis including memory, bandwidth, and FLOPs per operation
Real model presets (GPT-2, LLaMA, Qwen, DeepSeek) with exact specifications

Key Concepts: FFN computation, matrix flows, vocabulary logits, sampling strategies, performance analysis

🎯 Mixture of Experts: Scaling Transformers Efficiently

File: mixture-of-experts.html

Interactive exploration of how MoE scales transformer models through sparsity and selective expert activation:

Dense vs sparse computation analysis with exact parameter calculations
Router mechanics - how intelligent token assignment works mathematically
Expert specialization - what each expert learns and emergent behaviors
Load balancing challenges and solutions (auxiliary loss, Switch Transformer)
Performance analysis with real model architectures (LLaMA, Qwen, DeepSeek)
Cost-benefit analysis - economic implications of MoE scaling
Real-world MoE models - Switch Transformer, GLaM, Mixtral, GPT-4 analysis
Interactive simulations - route tokens through expert networks

Key Concepts: Sparse computation, expert routing, load balancing, parameter scaling, sparsity benefits

📊 Context Length Impact: Training vs Inference

File: context-length-impact.html

Mathematical analysis of why models trained on long contexts excel at shorter sequences:

Fixed vs dynamic components in transformer models
RoPE frequency analysis - what changes and what doesn't
Performance metrics with concrete speed/memory calculations
Step-by-step mathematical proofs with real examples
Interactive comparisons across different context lengths

Key Concepts: Context extension, performance analysis, RoPE frequencies, training vs inference

🏗️ Repository Structure

visual-ai-tutorials/
├── index.html                         # Landing page with tutorial links
├── transformer-basics.html            # Transformer basics tutorial [NEW]
├── architecture-comparison.html       # Architecture comparison tutorial [NEW]
├── qkv-matrices.html                  # Q,K,V matrix tutorial  
├── lora-tutorial.html                 # LoRA mathematics tutorial [NEW SERIES]
├── finetuning-comparison.html         # Full fine-tuning vs LoRA [NEW SERIES]
├── advanced-peft.html                 # Advanced PEFT techniques [NEW SERIES]
├── rope-tutorial.html                 # RoPE tutorial
├── complete-attention-mechanism.html  # Complete attention mechanism
├── attention-evolution.html           # Attention mechanisms evolution
├── text-generation-process.html       # Text generation process
├── mixture-of-experts.html            # Mixture of Experts
├── context-length-impact.html         # Context length impact tutorial
└── README.md                          # This file

🎯 Target Audience

AI/ML Engineers learning transformer internals and fine-tuning strategies
Researchers studying attention mechanisms, PEFT techniques, and position encoding
Students in NLP/deep learning courses
Developers working with LLMs who want to understand underlying mathematics
Practitioners fine-tuning models for production deployment
Anyone curious about how modern AI models like GPT, Claude, and Gemini work

✨ Features

📱 Responsive Design - Works on desktop, tablet, and mobile
🎨 Interactive Visualizations - Real-time calculations and demonstrations
🔢 Mathematical Precision - Step-by-step formulas with actual numbers
📊 Real Model Data - Architecture specs from production models
🎛️ Configurable Examples - Adjust parameters to see immediate effects
📚 Educational Focus - Designed for learning, not just reference
💻 Production Ready - Deployment strategies and resource planning

🚀 Getting Started

Option 1: View Online

Simply visit the live demo to access all tutorials immediately.

Option 2: Run Locally

Clone this repository:

git clone https://github.com/profitmonk/visual-ai-tutorials.git
cd visual-ai-tutorials

Open index.html in your browser or serve with a local server:

python -m http.server 8000
# Then visit http://localhost:8000

📖 Tutorial Highlights

🔥 What Makes These Tutorials Special

Real Architecture Data: Actual specs from GPT-4, Claude Sonnet 4, Gemini 2.5 Pro, LLaMA, Qwen, DeepSeek
Interactive Math: See formulas in action with adjustable parameters
Visual Learning: Color-coded matrices, dimension pairing, rotation visualizations
Concrete Examples: Real token sequences, actual memory calculations, exact FLOP counts
Progressive Complexity: Build understanding step-by-step
Performance Analysis: Memory usage, bandwidth requirements, computational bottlenecks
Production Focus: Real deployment strategies and resource planning

🎓 Learning Outcomes

After completing these tutorials, you'll understand:

Foundation: Why transformers revolutionized AI and how they work fundamentally
Architecture Design: How different companies approach LLM architecture and trade-offs
Fine-tuning Mastery: Complete spectrum from full fine-tuning to advanced PEFT techniques
LoRA Mathematics: Low-rank decomposition, parameter efficiency, and optimal strategies
Resource Optimization: Memory, compute, and cost analysis for production deployment
How RoPE encodes position through rotation mathematics
Why attention matrices scale quadratically with sequence length
How model dimensions affect memory and computation requirements
The evolution of attention mechanisms and memory optimization techniques
How KV caching works universally across all attention variants
How MoE enables massive parameter scaling through sparse computation
The mathematics of expert routing and load balancing
Trade-offs between memory, computation, and model quality in MoE systems
The complete flow from attention output to next token prediction
How feed-forward networks transform representations
Why models trained on long contexts work better on short contexts
The relationship between training and inference in transformer models
Exact computational requirements for real transformer models

🎓 Recommended Learning Path

For maximum understanding, follow this order:

🏛️ Foundation Phase

🏗️ Transformer Basics - Understand the revolutionary breakthrough and foundation
📊 Architecture Comparison - Learn how modern LLMs differ and why
🎯 Q, K, V Matrix Dimensions - Understand the basic building blocks
🌀 RoPE: Rotary Position Embedding - Learn how position is encoded

⚡ Core Mechanisms Phase

⚡ Complete Attention Mechanism - See how Q, K, V work together
🔄 Attention Mechanisms Evolution - Learn memory optimization and scaling techniques
🚀 Text Generation Process - Complete pipeline from attention to tokens

🔧 Fine-tuning Mastery Phase

🔍 LoRA Mathematics - Master the most popular PEFT technique
🎛️ Full Fine-tuning vs LoRA - Complete comparison and decision framework
🚀 Advanced PEFT - Cutting-edge techniques (QLoRA, DoRA, etc.)

🚀 Advanced Topics Phase

🎯 Mixture of Experts - Advanced scaling through sparse computation
📊 Context Length Impact - Advanced concepts about training vs inference

🤝 Contributing

Contributions are welcome! Whether it's:

🐛 Bug fixes
✨ New tutorial topics
📝 Documentation improvements
🎨 UI/UX enhancements
📊 Additional model architectures

Please feel free to open issues or submit pull requests.

📄 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

Built with educational focus to demystify transformer architecture and fine-tuning
Inspired by the need for visual, interactive explanations of complex AI concepts
Mathematical content based on original research papers and production model specifications
Fine-tuning tutorials address the practical gap between theory and implementation

📞 Contact

GitHub Issues: For bugs, feature requests, or questions
Discussions: For general questions about transformer architecture and fine-tuning
Pull Requests: For contributions

⭐ Star this repository if these tutorials helped you understand transformers and fine-tuning better!

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
README.md		README.md
advanced-peft.html		advanced-peft.html
architecture-comparison.html		architecture-comparison.html
attention-evolution.html		attention-evolution.html
complete-attention-mechanism.html		complete-attention-mechanism.html
context-length-impact-tutorial.html		context-length-impact-tutorial.html
context-length-impact.html		context-length-impact.html
finetuning-comparison.html		finetuning-comparison.html
index.html		index.html
lora-tutorial.html		lora-tutorial.html
mixture-of-experts.html		mixture-of-experts.html
qkv-matrices.html		qkv-matrices.html
rope-tutorial.html		rope-tutorial.html
text-generation-process.html		text-generation-process.html
transformer-basics.html		transformer-basics.html

profitmonk/visual-ai-tutorials

Folders and files

Latest commit

History

Repository files navigation