Aether v2.0 - Major Performance & Quality Upgrade 🚀
🔥 Performance Improvements
CUDA Kernel Acceleration (10x faster!)
- NEW: Triton/TorchScript CUDA kernel for WKV computation
- Training speed: 10x faster (6 hours vs 60 hours for full dataset)
- Inference speed: Runs efficiently on CPU without GPU
- Memory optimized: MAX_LENGTH reduced to 64 for faster training
Benchmark Results:
- Training: 1.4-1.5 steps/s on T4 GPU (was 0.0 steps/s)
- Full 500MB dataset: ~6 hours (was 60+ hours)
- CPU inference: 5-10 tokens/sec on standard hardware
📊 Enhanced Dataset v2.0
Massive Quality & Scale Improvements
- 570K raw text entries (29% of dataset) - 38x increase!
- 1.37M instruction-tuned entries (71% of dataset)
- Proper format separation:
- Phase 1-2: Pure raw text for pre-training
- Phase 3+: Structured User:/Aether: format for instruction following
Identity Integration
- Aether self-awareness naturally embedded in training data
- Model knows:
- It's a 51M parameter bilingual model
- Based on RWKV v4 architecture (14 layers, 640 dimensions)
- Created by Konpep
- Speaks fluent Greek and English
- Balanced presence: ~1.4% of dataset (not overwhelming)
Content Distribution
- 60% raw pre-training text (natural language)
- 30% single-turn Q&A
- 10% multi-turn conversations
- 50/50 English/Greek balance
🏗️ Architecture Highlights
Model Specs:
- Parameters: 51.17M
- Layers: 14
- Hidden size: 640
- Architecture: RWKV v4 (linear O(T) complexity)
- Tokenizer: Byte-level BPE, 8,192 vocab
Key Features:
- Linear complexity (not quadratic like Transformers)
- CPU-friendly inference
- No attention matrix required
- Constant memory per token
📦 What's Included
- ✅ Optimized training notebook (Colab-ready)
- ✅ CUDA kernel (
wkv_cuda_kernel.py) - ✅ Enhanced dataset generator v6
- ✅ Interactive inference script
- ✅ Model card & documentation