Release Aksu v1.1.0a0 — SOTA GPU Acceleration + PDF Cleaning (alpha) · melikkul/Aksu

What's New in v1.1.0a0

SOTA improvements to NeuralBackend GPU acceleration and PDF text ingestion.

🚀 NeuralBackend GPU Stack

Device auto-detect (CUDA→MPS→XPU→CPU), bf16 mixed precision via torch.amp.autocast, torch.compile(mode="reduce-overhead"), torch.inference_mode(), batched API (predict_batch), pinned memory + non-blocking GPU transfer, warm-up pass.

Measured CPU speedup: 4.89× over v1.0.0a0 baseline (target: 1.3×)

Baseline: 21.75 tok/s → v1.1: 106.33 tok/s

🧹 PDF Text Cleaning Pipeline

New aksu.ariturk APIs:

reconstruct_line_breaks(text) — 4-signal hyphenation decoder (LEX→CP→VH→fallback)
fix_pdf_artifacts(text) — 6-stage: ftfy + NFKC + zero-width strip + repeat-collapse + header/footer + diacritic stub
is_morphologically_valid(word) — vowel harmony heuristic (detects loanwords)
TextCleaner.fix_line_breaks() / fix_artifacts() — additive methods, object-identity preserved

📚 Bundled Resources

Zemberek Turkish wordlist (100K entries, Apache 2.0), 302 KB gzipped at src/aksu/ariturk/data/turkish_wordlist.txt.gz.

🧪 Testing

§M.0 baseline gate: 1077 passed, 0 failed (v1.0.0a0 backwards-compatible)
71 new tests: 14 NeuralBackend GPU + 22 PDF hyphenation + 21 artifact + 5 integration

📦 New Dependencies

Required: ftfy>=6.1, regex>=2023.0 (<1.5MB combined)
Optional aksu[full]: kenlm>=0.2.0, pyhyphen>=4.0.3

⚠️ Alpha Notes

Benchmark used synthetic-weights MorphAtomizer at production-realistic architecture (embed=128, hidden=256, layers=3); measures throughput, not accuracy
CUDA benchmark is informational for alpha; verified on TRUBA akya-cuda for v1.1.0 stable promotion

🛣️ Deferred to v1.2

KenLM Turkish 3-gram for hyphenation
Char-LM diacritic restoration
Lemma-aware lexicon lookup
High-level aksu.cleanup() one-shot API

Install

```bash
pip install aksu==1.1.0a0
```

Audit: audit/v1.1.0_release_report.md
Citation: CITATION.cff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aksu v1.1.0a0 — SOTA GPU Acceleration + PDF Cleaning (alpha)

Choose a tag to compare

Sorry, something went wrong.