Skip to content

EasyDeL v0.2.0.2

Latest

Choose a tag to compare

@erfanzar erfanzar released this 03 Jan 12:19
· 140 commits to main since this release
afdcca6

Highlights

  • Unified attention support end-to-end, including new cache structures and eSurge compatibility.
  • Faster GPU/TPU inference via backend-aware attention selection, KV-cache update optimizations, and smarter compilation/batching defaults.

Added

  • Unified attention mechanism across attention layers and generation/scheduler paths.
  • New cache types for unified attention, including UnifiedAttentionCache and related config/view helpers.
  • HybridCache support and expanded unified-attention cache integration.

Performance & Behavior Changes

  • GPU inference now prefers unified_attention and TPU inference prefers ragged_page_attention_v3, with warnings when a suboptimal mechanism is selected.
  • KV-cache updates are optimized for GPU latency (vectorized scatter approach; improved memory donation behavior).
  • eSurge compilation is capped to the scheduler’s actual per-step token budget to reduce startup time for long-context models.
  • runner.compile() now accepts max_num_batched_tokens for fine-grained compilation control.
  • GPU/TPU-aware auto-defaults for max_num_batched_tokens (GPU: >= 2048 tokens/step, TPU: >= 8192 tokens/step) and higher TPU defaults.
  • Performance tuning updates (numexpr threading configuration, JAX PGLE enablement, and XLA GPU flag fixes).

Evaluation

  • eSurge lm-eval adapter improvements: exact teacher-forced log-likelihood scoring, rolling-window perplexity, per-request stop sequences, improved greedy_until, and more robust tokenization/chat-template fallbacks.

Fixes

  • Dtype conversion adjustments in the bridge for more consistent behavior.
  • Linting fixes in tests and the Xerxes model.

Dependency Updates

  • Upgrade ejkernel to v0.0.50.

Merged PRs

  • #249: Unified attention mechanism + caching structures.
  • #250: Bridge dtype conversion updates.
  • #251: eSurge Speedup v1.

What's Changed

  • feat: Add unified attention mechanism and related caching structures by @erfanzar in #249
  • modify dtype conversation in bridge. by @erfanzar in #250
  • eSurge Speedup v1 (esurge/speedup-v1) by @erfanzar in #251

Full Changelog: v0.2.0.1...v0.2.0.2