Skip to content

profitmonk/visual-ai-tutorials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Interactive Transformer Architecture Tutorials

Learn transformer architecture concepts through hands-on visualizations and step-by-step mathematical analysis.

πŸš€ Live Demo

πŸ‘‰ View Interactive Tutorials

πŸ“š Available Tutorials

πŸ—οΈ Transformer Basics: The Foundation [NEW]

File: transformer-basics.html

Essential foundation for understanding modern AI - from the revolutionary breakthrough to why transformers work so well:

  • The problem with RNNs and CNNs - why sequential processing was a bottleneck
  • The attention breakthrough - "Attention is All You Need" explained simply
  • Core architecture components - interactive exploration of transformer building blocks
  • Three paradigms - Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5)
  • Evolution timeline - from 2017 research to ChatGPT revolution
  • Interactive comparisons - see why transformers won over previous architectures

Key Concepts: Attention mechanism, parallel processing, architectural paradigms, AI evolution


πŸ“Š Architecture Comparison: Modern LLM Designs [NEW]

File: architecture-comparison.html

Comprehensive comparison of modern LLM architectures across the industry:

  • Real model analysis - GPT-4, Claude, Gemini, LLaMA, Qwen, DeepSeek architectures
  • Design decisions breakdown - why different companies made different choices
  • Performance vs efficiency trade-offs - computational costs and capabilities
  • Architecture evolution - from academic research to production systems
  • Interactive model explorer - compare specifications side-by-side
  • Future trends analysis - where LLM architectures are heading

Key Concepts: Model comparison, design trade-offs, production considerations, architectural evolution


🎯 Q, K, V Matrix Dimensions

File: qkv-matrices.html

Interactive exploration of attention mechanism matrix sizes and their relationship to model architecture:

  • Real model comparisons (GPT-4, Claude Sonnet 4, Gemini, DeepSeek, LLaMA)
  • Matrix size calculations showing how d and m affect memory/computation
  • Architecture analysis with concrete memory requirements
  • Visual demonstrations of parameter scaling

Key Concepts: Attention matrices, model dimensions, memory scaling, architecture comparison


πŸ” LoRA: Low-Rank Adaptation Mathematics [NEW SERIES]

File: lora-tutorial.html

Complete mathematical foundation of Low-Rank Adaptation - the breakthrough technique for efficient fine-tuning:

  • LoRA core equation - W = Wβ‚€ + BA with step-by-step derivation
  • Interactive parameter calculator - real-time memory savings and parameter reduction
  • Matrix decomposition visualizer - see how low-rank approximation works
  • Layer targeting strategies - which layers to adapt (Q,K,V vs FFN analysis)
  • Rank selection guidance - optimal rank for different model sizes and tasks
  • Real model examples - LLaMA, Mistral, Mixtral with actual specifications
  • Production deployment - multi-tenant serving and adapter swapping

Key Concepts: Low-rank decomposition, parameter efficiency, rank selection, adapter strategies


πŸŽ›οΈ Full Fine-tuning vs LoRA: Complete Comparison [NEW SERIES]

File: finetuning-comparison.html

Master the complete spectrum of fine-tuning approaches - from full parameter updates to efficient adaptation:

  • Mathematical comparison - full update equations vs selective LoRA updates
  • Interactive layer freezing - strategic freezing for balanced efficiency/performance
  • Catastrophic forgetting analysis - risk assessment and mitigation strategies
  • Memory & cost calculator - real hardware requirements and cloud costs
  • Decision framework - smart advisor for choosing optimal approach
  • Training speed analysis - time and efficiency comparisons
  • Production strategies - multi-task serving and deployment patterns

Key Concepts: Full fine-tuning mathematics, layer freezing, catastrophic forgetting, resource optimization


πŸš€ Advanced PEFT: QLoRA, DoRA & Modern Techniques [NEW SERIES]

File: advanced-peft.html

Cutting-edge Parameter-Efficient Fine-Tuning techniques for maximum efficiency:

  • QLoRA deep dive - 4-bit quantization + LoRA mathematics
  • DoRA analysis - Weight-Decomposed Low-Rank Adaptation
  • AdaLoRA - adaptive rank allocation during training
  • Modern optimizations - LoRA+, Delta-LoRA, and latest research
  • Quantization strategies - INT8, INT4, and mixed-precision approaches
  • Production deployment - serving quantized models efficiently
  • Performance benchmarks - comprehensive comparison across techniques

Key Concepts: Quantization mathematics, advanced PEFT, deployment optimization, cutting-edge research


πŸŒ€ RoPE: Rotary Position Embedding

File: rope-tutorial.html

Comprehensive guide to understanding how transformers encode position information through rotation:

  • Visual dimension pairing with color-coded examples
  • Complete mathematical walkthrough with cos/sin transformations
  • Interactive examples with up to 128D embeddings and 128 token contexts
  • Context extension challenges and scaling analysis
  • Step-by-step RoPE application with real token examples

Key Concepts: Position encoding, dimension pairs, rotation mathematics, context scaling


⚑ Complete Attention Mechanism

File: complete-attention-mechanism.html

Interactive step-by-step walkthrough of how Q, K, V matrices work together in transformer attention:

  • Matrix creation process with real token examples
  • Q Γ— K^T computation showing compatibility scores
  • Softmax normalization converting scores to probabilities
  • Attention Γ— V application demonstrating information flow
  • Interactive matrix explorer showing individual component impacts

Key Concepts: QΓ—K^T computation, softmax normalization, attentionΓ—V, matrix interactions


πŸ”„ Attention Mechanisms Evolution: MHA β†’ GQA β†’ MLA

File: attention-evolution.html

Complete evolution of attention mechanisms from Multi-Head Attention through Grouped Query Attention to Multi-Head Latent Attention:

  • KV caching foundation - universal optimization across all attention mechanisms
  • Memory scaling analysis with exact calculations for different architectures
  • Evolution timeline from MHA (2017) β†’ MQA (2019) β†’ GQA (2023) β†’ MLA (2024)
  • Interactive comparisons showing memory savings and trade-offs
  • Deep dive into MLA with compression/decompression mathematics
  • Real model configurations (GPT, LLaMA, Qwen, DeepSeek) with memory analysis

Key Concepts: KV caching, memory optimization, grouped attention, compression techniques, evolution timeline


πŸš€ Text Generation Process

File: text-generation-process.html

Complete mathematical walkthrough from attention output to next token prediction:

  • Feed-forward network computation with exact matrix dimensions
  • Layer normalization & residual connections mathematical analysis
  • Output projection to vocabulary showing the largest matrix operation
  • Sampling strategies (temperature, top-k, top-p) with live probability visualization
  • Performance analysis including memory, bandwidth, and FLOPs per operation
  • Real model presets (GPT-2, LLaMA, Qwen, DeepSeek) with exact specifications

Key Concepts: FFN computation, matrix flows, vocabulary logits, sampling strategies, performance analysis


🎯 Mixture of Experts: Scaling Transformers Efficiently

File: mixture-of-experts.html

Interactive exploration of how MoE scales transformer models through sparsity and selective expert activation:

  • Dense vs sparse computation analysis with exact parameter calculations
  • Router mechanics - how intelligent token assignment works mathematically
  • Expert specialization - what each expert learns and emergent behaviors
  • Load balancing challenges and solutions (auxiliary loss, Switch Transformer)
  • Performance analysis with real model architectures (LLaMA, Qwen, DeepSeek)
  • Cost-benefit analysis - economic implications of MoE scaling
  • Real-world MoE models - Switch Transformer, GLaM, Mixtral, GPT-4 analysis
  • Interactive simulations - route tokens through expert networks

Key Concepts: Sparse computation, expert routing, load balancing, parameter scaling, sparsity benefits


πŸ“Š Context Length Impact: Training vs Inference

File: context-length-impact.html

Mathematical analysis of why models trained on long contexts excel at shorter sequences:

  • Fixed vs dynamic components in transformer models
  • RoPE frequency analysis - what changes and what doesn't
  • Performance metrics with concrete speed/memory calculations
  • Step-by-step mathematical proofs with real examples
  • Interactive comparisons across different context lengths

Key Concepts: Context extension, performance analysis, RoPE frequencies, training vs inference

πŸ—οΈ Repository Structure

visual-ai-tutorials/
β”œβ”€β”€ index.html                         # Landing page with tutorial links
β”œβ”€β”€ transformer-basics.html            # Transformer basics tutorial [NEW]
β”œβ”€β”€ architecture-comparison.html       # Architecture comparison tutorial [NEW]
β”œβ”€β”€ qkv-matrices.html                  # Q,K,V matrix tutorial  
β”œβ”€β”€ lora-tutorial.html                 # LoRA mathematics tutorial [NEW SERIES]
β”œβ”€β”€ finetuning-comparison.html         # Full fine-tuning vs LoRA [NEW SERIES]
β”œβ”€β”€ advanced-peft.html                 # Advanced PEFT techniques [NEW SERIES]
β”œβ”€β”€ rope-tutorial.html                 # RoPE tutorial
β”œβ”€β”€ complete-attention-mechanism.html  # Complete attention mechanism
β”œβ”€β”€ attention-evolution.html           # Attention mechanisms evolution
β”œβ”€β”€ text-generation-process.html       # Text generation process
β”œβ”€β”€ mixture-of-experts.html            # Mixture of Experts
β”œβ”€β”€ context-length-impact.html         # Context length impact tutorial
└── README.md                          # This file

🎯 Target Audience

  • AI/ML Engineers learning transformer internals and fine-tuning strategies
  • Researchers studying attention mechanisms, PEFT techniques, and position encoding
  • Students in NLP/deep learning courses
  • Developers working with LLMs who want to understand underlying mathematics
  • Practitioners fine-tuning models for production deployment
  • Anyone curious about how modern AI models like GPT, Claude, and Gemini work

✨ Features

  • πŸ“± Responsive Design - Works on desktop, tablet, and mobile
  • 🎨 Interactive Visualizations - Real-time calculations and demonstrations
  • πŸ”’ Mathematical Precision - Step-by-step formulas with actual numbers
  • πŸ“Š Real Model Data - Architecture specs from production models
  • πŸŽ›οΈ Configurable Examples - Adjust parameters to see immediate effects
  • πŸ“š Educational Focus - Designed for learning, not just reference
  • πŸ’» Production Ready - Deployment strategies and resource planning

πŸš€ Getting Started

Option 1: View Online

Simply visit the live demo to access all tutorials immediately.

Option 2: Run Locally

  1. Clone this repository:

    git clone https://github.com/profitmonk/visual-ai-tutorials.git
    cd visual-ai-tutorials
  2. Open index.html in your browser or serve with a local server:

    python -m http.server 8000
    # Then visit http://localhost:8000

πŸ“– Tutorial Highlights

πŸ”₯ What Makes These Tutorials Special

  • Real Architecture Data: Actual specs from GPT-4, Claude Sonnet 4, Gemini 2.5 Pro, LLaMA, Qwen, DeepSeek
  • Interactive Math: See formulas in action with adjustable parameters
  • Visual Learning: Color-coded matrices, dimension pairing, rotation visualizations
  • Concrete Examples: Real token sequences, actual memory calculations, exact FLOP counts
  • Progressive Complexity: Build understanding step-by-step
  • Performance Analysis: Memory usage, bandwidth requirements, computational bottlenecks
  • Production Focus: Real deployment strategies and resource planning

πŸŽ“ Learning Outcomes

After completing these tutorials, you'll understand:

  • Foundation: Why transformers revolutionized AI and how they work fundamentally
  • Architecture Design: How different companies approach LLM architecture and trade-offs
  • Fine-tuning Mastery: Complete spectrum from full fine-tuning to advanced PEFT techniques
  • LoRA Mathematics: Low-rank decomposition, parameter efficiency, and optimal strategies
  • Resource Optimization: Memory, compute, and cost analysis for production deployment
  • How RoPE encodes position through rotation mathematics
  • Why attention matrices scale quadratically with sequence length
  • How model dimensions affect memory and computation requirements
  • The evolution of attention mechanisms and memory optimization techniques
  • How KV caching works universally across all attention variants
  • How MoE enables massive parameter scaling through sparse computation
  • The mathematics of expert routing and load balancing
  • Trade-offs between memory, computation, and model quality in MoE systems
  • The complete flow from attention output to next token prediction
  • How feed-forward networks transform representations
  • Why models trained on long contexts work better on short contexts
  • The relationship between training and inference in transformer models
  • Exact computational requirements for real transformer models

πŸŽ“ Recommended Learning Path

For maximum understanding, follow this order:

πŸ›οΈ Foundation Phase

  1. πŸ—οΈ Transformer Basics - Understand the revolutionary breakthrough and foundation
  2. πŸ“Š Architecture Comparison - Learn how modern LLMs differ and why
  3. 🎯 Q, K, V Matrix Dimensions - Understand the basic building blocks
  4. πŸŒ€ RoPE: Rotary Position Embedding - Learn how position is encoded

⚑ Core Mechanisms Phase

  1. ⚑ Complete Attention Mechanism - See how Q, K, V work together
  2. πŸ”„ Attention Mechanisms Evolution - Learn memory optimization and scaling techniques
  3. πŸš€ Text Generation Process - Complete pipeline from attention to tokens

πŸ”§ Fine-tuning Mastery Phase

  1. πŸ” LoRA Mathematics - Master the most popular PEFT technique
  2. πŸŽ›οΈ Full Fine-tuning vs LoRA - Complete comparison and decision framework
  3. πŸš€ Advanced PEFT - Cutting-edge techniques (QLoRA, DoRA, etc.)

πŸš€ Advanced Topics Phase

  1. 🎯 Mixture of Experts - Advanced scaling through sparse computation
  2. πŸ“Š Context Length Impact - Advanced concepts about training vs inference

🀝 Contributing

Contributions are welcome! Whether it's:

  • πŸ› Bug fixes
  • ✨ New tutorial topics
  • πŸ“ Documentation improvements
  • 🎨 UI/UX enhancements
  • πŸ“Š Additional model architectures

Please feel free to open issues or submit pull requests.

πŸ“„ License

This project is open source and available under the MIT License.

πŸ™ Acknowledgments

  • Built with educational focus to demystify transformer architecture and fine-tuning
  • Inspired by the need for visual, interactive explanations of complex AI concepts
  • Mathematical content based on original research papers and production model specifications
  • Fine-tuning tutorials address the practical gap between theory and implementation

πŸ“ž Contact

  • GitHub Issues: For bugs, feature requests, or questions
  • Discussions: For general questions about transformer architecture and fine-tuning
  • Pull Requests: For contributions

⭐ Star this repository if these tutorials helped you understand transformers and fine-tuning better!

About

Visual AI tutorials

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages