Learn transformer architecture concepts through hands-on visualizations and step-by-step mathematical analysis.
π View Interactive Tutorials
File: transformer-basics.html
Essential foundation for understanding modern AI - from the revolutionary breakthrough to why transformers work so well:
- The problem with RNNs and CNNs - why sequential processing was a bottleneck
- The attention breakthrough - "Attention is All You Need" explained simply
- Core architecture components - interactive exploration of transformer building blocks
- Three paradigms - Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5)
- Evolution timeline - from 2017 research to ChatGPT revolution
- Interactive comparisons - see why transformers won over previous architectures
Key Concepts: Attention mechanism, parallel processing, architectural paradigms, AI evolution
File: architecture-comparison.html
Comprehensive comparison of modern LLM architectures across the industry:
- Real model analysis - GPT-4, Claude, Gemini, LLaMA, Qwen, DeepSeek architectures
- Design decisions breakdown - why different companies made different choices
- Performance vs efficiency trade-offs - computational costs and capabilities
- Architecture evolution - from academic research to production systems
- Interactive model explorer - compare specifications side-by-side
- Future trends analysis - where LLM architectures are heading
Key Concepts: Model comparison, design trade-offs, production considerations, architectural evolution
File: qkv-matrices.html
Interactive exploration of attention mechanism matrix sizes and their relationship to model architecture:
- Real model comparisons (GPT-4, Claude Sonnet 4, Gemini, DeepSeek, LLaMA)
- Matrix size calculations showing how d and m affect memory/computation
- Architecture analysis with concrete memory requirements
- Visual demonstrations of parameter scaling
Key Concepts: Attention matrices, model dimensions, memory scaling, architecture comparison
File: lora-tutorial.html
Complete mathematical foundation of Low-Rank Adaptation - the breakthrough technique for efficient fine-tuning:
- LoRA core equation - W = Wβ + BA with step-by-step derivation
- Interactive parameter calculator - real-time memory savings and parameter reduction
- Matrix decomposition visualizer - see how low-rank approximation works
- Layer targeting strategies - which layers to adapt (Q,K,V vs FFN analysis)
- Rank selection guidance - optimal rank for different model sizes and tasks
- Real model examples - LLaMA, Mistral, Mixtral with actual specifications
- Production deployment - multi-tenant serving and adapter swapping
Key Concepts: Low-rank decomposition, parameter efficiency, rank selection, adapter strategies
File: finetuning-comparison.html
Master the complete spectrum of fine-tuning approaches - from full parameter updates to efficient adaptation:
- Mathematical comparison - full update equations vs selective LoRA updates
- Interactive layer freezing - strategic freezing for balanced efficiency/performance
- Catastrophic forgetting analysis - risk assessment and mitigation strategies
- Memory & cost calculator - real hardware requirements and cloud costs
- Decision framework - smart advisor for choosing optimal approach
- Training speed analysis - time and efficiency comparisons
- Production strategies - multi-task serving and deployment patterns
Key Concepts: Full fine-tuning mathematics, layer freezing, catastrophic forgetting, resource optimization
File: advanced-peft.html
Cutting-edge Parameter-Efficient Fine-Tuning techniques for maximum efficiency:
- QLoRA deep dive - 4-bit quantization + LoRA mathematics
- DoRA analysis - Weight-Decomposed Low-Rank Adaptation
- AdaLoRA - adaptive rank allocation during training
- Modern optimizations - LoRA+, Delta-LoRA, and latest research
- Quantization strategies - INT8, INT4, and mixed-precision approaches
- Production deployment - serving quantized models efficiently
- Performance benchmarks - comprehensive comparison across techniques
Key Concepts: Quantization mathematics, advanced PEFT, deployment optimization, cutting-edge research
File: rope-tutorial.html
Comprehensive guide to understanding how transformers encode position information through rotation:
- Visual dimension pairing with color-coded examples
- Complete mathematical walkthrough with cos/sin transformations
- Interactive examples with up to 128D embeddings and 128 token contexts
- Context extension challenges and scaling analysis
- Step-by-step RoPE application with real token examples
Key Concepts: Position encoding, dimension pairs, rotation mathematics, context scaling
File: complete-attention-mechanism.html
Interactive step-by-step walkthrough of how Q, K, V matrices work together in transformer attention:
- Matrix creation process with real token examples
- Q Γ K^T computation showing compatibility scores
- Softmax normalization converting scores to probabilities
- Attention Γ V application demonstrating information flow
- Interactive matrix explorer showing individual component impacts
Key Concepts: QΓK^T computation, softmax normalization, attentionΓV, matrix interactions
File: attention-evolution.html
Complete evolution of attention mechanisms from Multi-Head Attention through Grouped Query Attention to Multi-Head Latent Attention:
- KV caching foundation - universal optimization across all attention mechanisms
- Memory scaling analysis with exact calculations for different architectures
- Evolution timeline from MHA (2017) β MQA (2019) β GQA (2023) β MLA (2024)
- Interactive comparisons showing memory savings and trade-offs
- Deep dive into MLA with compression/decompression mathematics
- Real model configurations (GPT, LLaMA, Qwen, DeepSeek) with memory analysis
Key Concepts: KV caching, memory optimization, grouped attention, compression techniques, evolution timeline
File: text-generation-process.html
Complete mathematical walkthrough from attention output to next token prediction:
- Feed-forward network computation with exact matrix dimensions
- Layer normalization & residual connections mathematical analysis
- Output projection to vocabulary showing the largest matrix operation
- Sampling strategies (temperature, top-k, top-p) with live probability visualization
- Performance analysis including memory, bandwidth, and FLOPs per operation
- Real model presets (GPT-2, LLaMA, Qwen, DeepSeek) with exact specifications
Key Concepts: FFN computation, matrix flows, vocabulary logits, sampling strategies, performance analysis
File: mixture-of-experts.html
Interactive exploration of how MoE scales transformer models through sparsity and selective expert activation:
- Dense vs sparse computation analysis with exact parameter calculations
- Router mechanics - how intelligent token assignment works mathematically
- Expert specialization - what each expert learns and emergent behaviors
- Load balancing challenges and solutions (auxiliary loss, Switch Transformer)
- Performance analysis with real model architectures (LLaMA, Qwen, DeepSeek)
- Cost-benefit analysis - economic implications of MoE scaling
- Real-world MoE models - Switch Transformer, GLaM, Mixtral, GPT-4 analysis
- Interactive simulations - route tokens through expert networks
Key Concepts: Sparse computation, expert routing, load balancing, parameter scaling, sparsity benefits
File: context-length-impact.html
Mathematical analysis of why models trained on long contexts excel at shorter sequences:
- Fixed vs dynamic components in transformer models
- RoPE frequency analysis - what changes and what doesn't
- Performance metrics with concrete speed/memory calculations
- Step-by-step mathematical proofs with real examples
- Interactive comparisons across different context lengths
Key Concepts: Context extension, performance analysis, RoPE frequencies, training vs inference
visual-ai-tutorials/
βββ index.html # Landing page with tutorial links
βββ transformer-basics.html # Transformer basics tutorial [NEW]
βββ architecture-comparison.html # Architecture comparison tutorial [NEW]
βββ qkv-matrices.html # Q,K,V matrix tutorial
βββ lora-tutorial.html # LoRA mathematics tutorial [NEW SERIES]
βββ finetuning-comparison.html # Full fine-tuning vs LoRA [NEW SERIES]
βββ advanced-peft.html # Advanced PEFT techniques [NEW SERIES]
βββ rope-tutorial.html # RoPE tutorial
βββ complete-attention-mechanism.html # Complete attention mechanism
βββ attention-evolution.html # Attention mechanisms evolution
βββ text-generation-process.html # Text generation process
βββ mixture-of-experts.html # Mixture of Experts
βββ context-length-impact.html # Context length impact tutorial
βββ README.md # This file
- AI/ML Engineers learning transformer internals and fine-tuning strategies
- Researchers studying attention mechanisms, PEFT techniques, and position encoding
- Students in NLP/deep learning courses
- Developers working with LLMs who want to understand underlying mathematics
- Practitioners fine-tuning models for production deployment
- Anyone curious about how modern AI models like GPT, Claude, and Gemini work
- π± Responsive Design - Works on desktop, tablet, and mobile
- π¨ Interactive Visualizations - Real-time calculations and demonstrations
- π’ Mathematical Precision - Step-by-step formulas with actual numbers
- π Real Model Data - Architecture specs from production models
- ποΈ Configurable Examples - Adjust parameters to see immediate effects
- π Educational Focus - Designed for learning, not just reference
- π» Production Ready - Deployment strategies and resource planning
Simply visit the live demo to access all tutorials immediately.
-
Clone this repository:
git clone https://github.com/profitmonk/visual-ai-tutorials.git cd visual-ai-tutorials
-
Open
index.html
in your browser or serve with a local server:python -m http.server 8000 # Then visit http://localhost:8000
- Real Architecture Data: Actual specs from GPT-4, Claude Sonnet 4, Gemini 2.5 Pro, LLaMA, Qwen, DeepSeek
- Interactive Math: See formulas in action with adjustable parameters
- Visual Learning: Color-coded matrices, dimension pairing, rotation visualizations
- Concrete Examples: Real token sequences, actual memory calculations, exact FLOP counts
- Progressive Complexity: Build understanding step-by-step
- Performance Analysis: Memory usage, bandwidth requirements, computational bottlenecks
- Production Focus: Real deployment strategies and resource planning
After completing these tutorials, you'll understand:
- Foundation: Why transformers revolutionized AI and how they work fundamentally
- Architecture Design: How different companies approach LLM architecture and trade-offs
- Fine-tuning Mastery: Complete spectrum from full fine-tuning to advanced PEFT techniques
- LoRA Mathematics: Low-rank decomposition, parameter efficiency, and optimal strategies
- Resource Optimization: Memory, compute, and cost analysis for production deployment
- How RoPE encodes position through rotation mathematics
- Why attention matrices scale quadratically with sequence length
- How model dimensions affect memory and computation requirements
- The evolution of attention mechanisms and memory optimization techniques
- How KV caching works universally across all attention variants
- How MoE enables massive parameter scaling through sparse computation
- The mathematics of expert routing and load balancing
- Trade-offs between memory, computation, and model quality in MoE systems
- The complete flow from attention output to next token prediction
- How feed-forward networks transform representations
- Why models trained on long contexts work better on short contexts
- The relationship between training and inference in transformer models
- Exact computational requirements for real transformer models
For maximum understanding, follow this order:
- ποΈ Transformer Basics - Understand the revolutionary breakthrough and foundation
- π Architecture Comparison - Learn how modern LLMs differ and why
- π― Q, K, V Matrix Dimensions - Understand the basic building blocks
- π RoPE: Rotary Position Embedding - Learn how position is encoded
- β‘ Complete Attention Mechanism - See how Q, K, V work together
- π Attention Mechanisms Evolution - Learn memory optimization and scaling techniques
- π Text Generation Process - Complete pipeline from attention to tokens
- π LoRA Mathematics - Master the most popular PEFT technique
- ποΈ Full Fine-tuning vs LoRA - Complete comparison and decision framework
- π Advanced PEFT - Cutting-edge techniques (QLoRA, DoRA, etc.)
- π― Mixture of Experts - Advanced scaling through sparse computation
- π Context Length Impact - Advanced concepts about training vs inference
Contributions are welcome! Whether it's:
- π Bug fixes
- β¨ New tutorial topics
- π Documentation improvements
- π¨ UI/UX enhancements
- π Additional model architectures
Please feel free to open issues or submit pull requests.
This project is open source and available under the MIT License.
- Built with educational focus to demystify transformer architecture and fine-tuning
- Inspired by the need for visual, interactive explanations of complex AI concepts
- Mathematical content based on original research papers and production model specifications
- Fine-tuning tutorials address the practical gap between theory and implementation
- GitHub Issues: For bugs, feature requests, or questions
- Discussions: For general questions about transformer architecture and fine-tuning
- Pull Requests: For contributions
β Star this repository if these tutorials helped you understand transformers and fine-tuning better!