Skip to content

AETHER 51M Major Performance & Quality Upgrade

Latest

Choose a tag to compare

@konpep-dev konpep-dev released this 01 Jul 15:09
f08c7a0

Aether v2.0 - Major Performance & Quality Upgrade 🚀

🔥 Performance Improvements

CUDA Kernel Acceleration (10x faster!)

  • NEW: Triton/TorchScript CUDA kernel for WKV computation
  • Training speed: 10x faster (6 hours vs 60 hours for full dataset)
  • Inference speed: Runs efficiently on CPU without GPU
  • Memory optimized: MAX_LENGTH reduced to 64 for faster training

Benchmark Results:

  • Training: 1.4-1.5 steps/s on T4 GPU (was 0.0 steps/s)
  • Full 500MB dataset: ~6 hours (was 60+ hours)
  • CPU inference: 5-10 tokens/sec on standard hardware

📊 Enhanced Dataset v2.0

Massive Quality & Scale Improvements

  • 570K raw text entries (29% of dataset) - 38x increase!
  • 1.37M instruction-tuned entries (71% of dataset)
  • Proper format separation:
    • Phase 1-2: Pure raw text for pre-training
    • Phase 3+: Structured User:/Aether: format for instruction following

Identity Integration

  • Aether self-awareness naturally embedded in training data
  • Model knows:
    • It's a 51M parameter bilingual model
    • Based on RWKV v4 architecture (14 layers, 640 dimensions)
    • Created by Konpep
    • Speaks fluent Greek and English
  • Balanced presence: ~1.4% of dataset (not overwhelming)

Content Distribution

  • 60% raw pre-training text (natural language)
  • 30% single-turn Q&A
  • 10% multi-turn conversations
  • 50/50 English/Greek balance

🏗️ Architecture Highlights

Model Specs:

  • Parameters: 51.17M
  • Layers: 14
  • Hidden size: 640
  • Architecture: RWKV v4 (linear O(T) complexity)
  • Tokenizer: Byte-level BPE, 8,192 vocab

Key Features:

  • Linear complexity (not quadratic like Transformers)
  • CPU-friendly inference
  • No attention matrix required
  • Constant memory per token

📦 What's Included

  • ✅ Optimized training notebook (Colab-ready)
  • ✅ CUDA kernel (wkv_cuda_kernel.py)
  • ✅ Enhanced dataset generator v6
  • ✅ Interactive inference script
  • ✅ Model card & documentation