Skip to content

Aksu v1.1.0a0 — SOTA GPU Acceleration + PDF Cleaning (alpha)

Pre-release
Pre-release

Choose a tag to compare

@github-actions github-actions released this 17 May 18:49
· 0 commits to main since this release

What's New in v1.1.0a0

SOTA improvements to NeuralBackend GPU acceleration and PDF text ingestion.

🚀 NeuralBackend GPU Stack

Device auto-detect (CUDA→MPS→XPU→CPU), bf16 mixed precision via torch.amp.autocast, torch.compile(mode="reduce-overhead"), torch.inference_mode(), batched API (predict_batch), pinned memory + non-blocking GPU transfer, warm-up pass.

Measured CPU speedup: 4.89× over v1.0.0a0 baseline (target: 1.3×)

  • Baseline: 21.75 tok/s → v1.1: 106.33 tok/s

🧹 PDF Text Cleaning Pipeline

New aksu.ariturk APIs:

  • reconstruct_line_breaks(text) — 4-signal hyphenation decoder (LEX→CP→VH→fallback)
  • fix_pdf_artifacts(text) — 6-stage: ftfy + NFKC + zero-width strip + repeat-collapse + header/footer + diacritic stub
  • is_morphologically_valid(word) — vowel harmony heuristic (detects loanwords)
  • TextCleaner.fix_line_breaks() / fix_artifacts() — additive methods, object-identity preserved

📚 Bundled Resources

Zemberek Turkish wordlist (100K entries, Apache 2.0), 302 KB gzipped at src/aksu/ariturk/data/turkish_wordlist.txt.gz.

🧪 Testing

  • §M.0 baseline gate: 1077 passed, 0 failed (v1.0.0a0 backwards-compatible)
  • 71 new tests: 14 NeuralBackend GPU + 22 PDF hyphenation + 21 artifact + 5 integration

📦 New Dependencies

  • Required: ftfy>=6.1, regex>=2023.0 (<1.5MB combined)
  • Optional aksu[full]: kenlm>=0.2.0, pyhyphen>=4.0.3

⚠️ Alpha Notes

  • Benchmark used synthetic-weights MorphAtomizer at production-realistic architecture (embed=128, hidden=256, layers=3); measures throughput, not accuracy
  • CUDA benchmark is informational for alpha; verified on TRUBA akya-cuda for v1.1.0 stable promotion

🛣️ Deferred to v1.2

  • KenLM Turkish 3-gram for hyphenation
  • Char-LM diacritic restoration
  • Lemma-aware lexicon lookup
  • High-level aksu.cleanup() one-shot API

Install

```bash
pip install aksu==1.1.0a0
```

Audit: audit/v1.1.0_release_report.md
Citation: CITATION.cff