translit 0.5.0
translit 0.5.0
This release sharpens what translit is: Unicode adversarial-text defense and canonicalization, powered by Rust — TR39 visual confusable mapping, homoglyph / bidi / zalgo / invisible-character stripping, and standards-based Latin/Cyrillic/Greek transliteration. It also adds context-aware transliteration for abjad scripts and fixes a long-standing Linux packaging bug.
Highlights
Adversarial-text defense, front and center. translit maps confusables by appearance (TR39: Cyrillic р → Latin p), the mapping that actually reverses a homoglyph attack — unlike unidecode/anyascii/ftfy, which map phonetically and can't. The new Adversarial-Text Defense guide covers the phonetic-vs-visual distinction and the XMR benchmark evidence.
from translit import strip_obfuscation, normalize_confusables, is_safe_hostname
strip_obfuscation("рroduсt") # → "product" (Cyrillic р→p, с→c via TR39)
normalize_confusables("раypal") # → "paypal"
safe, details = is_safe_hostname("аpple.com") # → (False, …) leading Cyrillic аContext-aware transliteration for Arabic, Persian, and Hebrew. transliterate(text, context=True) uses dictionary-based vowel restoration (bigram → unigram → context-free) to produce readable romanization instead of consonant skeletons. Opt in with pip install translit-rs[arabic] / [hebrew] / [context].
Fixed
- Linux x86_64 wheels are now built as
cp39-abi3. Earlier releases only shipped acp38-cp38x86_64 Linux wheel, forcing a source build (Rust toolchain) on Python 3.9+.pip install translit-rsnow gets a prebuilt wheel on Linux x86_64 like every other platform. (#26) - Documentation corrections (consistent language-profile count; verified homoglyph examples).
Security
- All third-party GitHub Actions pinned to commit SHAs across CI and the release pipeline; added Dependabot to keep them current. Dev/docs dependency bumps (Pygments 2.20.0, pytest 9.0.3).
Compatibility
No breaking changes. No public API, language codes, or script coverage were removed — translit-rs still has zero runtime dependencies. CJK/Indic/other scripts remain available as best-effort, unidecode-compatible coverage.
Install
pip install translit-rsFull changelog: https://github.com/raeq/translit/blob/main/CHANGELOG.md