Visualize how transformers think, one attention head at a time.
An interactive, educational web simulation that lets you see inside a transformer model β from tokenization to next-token prediction. No black boxes. No abstract equations. Just a live, explorable pipeline modeled after GPT-2 Small.
π‘ Built for students, educators, and anyone curious about how Large Language Models actually work under the hood.
| Feature | Description |
|---|---|
| π€ Token Embedding | Watch input text split into tokens and map to embedding vectors with positional encoding |
| π Q/K/V Inspector | Examine Query, Key, and Value projections for each attention head |
| πΊοΈ Attention Heatmap | Interactive matrix with causal masking β see which tokens attend to which |
| π Probability Distribution | Real-time softmax output showing candidate tokens and their probabilities |
| β‘ Autoregressive Generation | Generate tokens step-by-step and observe how each new token reshapes attention |
| ποΈ Sampling Controls | Tune Temperature (0.3β1.5), Top-k (1β12), and generation length live |
| ποΈ Multi-Head View | Switch between attention heads to compare learned patterns |
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β GPT-2 Small (Simulation) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ€
β π§± Layers β 12 β
β π§ Attn Heads β 4 (visual) β
β π Hidden Size β 16 β
β π’ Head Dim β 4 β
β π Causal Mask β Active β
ββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ
| Technology | Purpose | |
|---|---|---|
| π | HTML5 | Semantic layout & SVG attention diagrams |
| π¨ | CSS3 | Custom properties, Grid, Flexbox |
| βοΈ | Vanilla JS | Simulation engine β zero dependencies |
| π€ | Google Fonts | Orbitron, Space Grotesk, IBM Plex Mono |
| π | Vercel | Static hosting & CDN |
No frameworks. No build step. No bundler. Just clean, dependency-free code.
simulasiLLM/
βββ π index.html # UI layout β toolbar, attention SVG, token stream, panels
βββ π¨ style.css # Styling with CSS custom properties
βββ βοΈ app.js # Engine β tokenizer, attention math, rendering, generation
βββ π DEPLOY.md # Deployment guide (Vercel, Cloudflare, Netlify, GH Pages)
βββ π¦ .github/
βββ workflows/ # CI/CD configuration
No install required β just serve the static files:
git clone https://github.com/romizone/simulasiLLM.git
cd simulasiLLM
python3 -m http.server 8081Open http://127.0.0.1:8081 in your browser.
Alternatively:
# Node.js
npx serve .
# PHP
php -S localhost:8081See DEPLOY.md for guides on Vercel, Cloudflare Pages, Netlify, and GitHub Pages.
π Input Text
β
βΌ
ββββββββββββ
β π€ Token β Split text into tokens (Unicode-aware regex)
β izer β
ββββββ¬ββββββ
β
βΌ
ββββββββββββ
β π Embed β Map tokens β dense vectors + positional encoding
β ding β
ββββββ¬ββββββ
β
βΌ
ββββββββββββ
β π Q/K/V β Project embeddings into Query, Key, Value spaces
β Project β
ββββββ¬ββββββ
β
βΌ
ββββββββββββ
β πΊοΈ Attn β score = (Q Β· Kα΅) / βd_k β causal mask β softmax
β Matrix β
ββββββ¬ββββββ
β
βΌ
ββββββββββββ
β π² Sampleβ Apply temperature & top-k filtering
β β
ββββββ¬ββββββ
β
βΌ
β‘ Next Token βββ (autoregressive loop)
| π |
"Simulating the Attention Mechanism in Large Language Models Based on the GPT-2 Architecture" π Read the full paper β |
Topics covered:
- π€ Token processing pipeline (BPE tokenization, embedding, positional encoding)
- π Mathematical formulation of scaled dot-product attention
- π Causal masking in autoregressive generation
- π‘οΈ Temperature scaling and top-k sampling strategies
- ποΈ GPT-2 Small architecture specifications
| Link | |
|---|---|
| π | Live Simulation β simulasillm.vercel.app |
| π | Research Paper β paper-llm-attention.vercel.app |
| π» | Source Code β github.com/romizone/simulasiLLM |
This project is open source and available for educational purposes.
Made with β€οΈ by Romi Nur Ismanto