v0.4.0
v0.4.0
Breaking changes
-
Batch functions removed.
transliterate_batch(),slugify_batch(),normalize_batch(), andstrip_accents_batch()are gone. The base functions now accept bothstrandlist[str]via@typing.overload:transliterate("café") # → "cafe" transliterate(["café", "naïve"]) # → ["cafe", "naive"]
-
strip_obfuscation()no longer transliterates. Uses TR39 confusable mapping (visual similarity) instead of phonetic transliteration.lang=parameter removed. Chain withtransliterate()if romanization is also needed.
New features
strip_obfuscation()— maximum-strength deobfuscation preset. Resolves homoglyph spoofing (Cyrillic р→p, с→c), strips zalgo, invisible chars, bidi attacks, expands emoji.lang_info()/script_info()— structured metadata for all 83 languages and 57 scripts, with import-time drift assertions.- 18 new languages (Balinese, Bamum, Buginese, Cherokee, Cham, Coptic, Tai Lue, Lisu, Meitei, Northern Thai, N'Ko, Santali, Sundanese, Syriac, Tai Le, Tagalog, Tamazight, Vai) and 10 new Script enum members.
Bug fixes
- Combining marks and zero-width characters no longer produce
[?](283 new TSV mappings) TextPipelineconfusable ordering fixed (transliterate before confusables)demojize()spaces adjacent emoji replacements ("🔥🔥"→"fire fire")- SCRIPT_RANGES sort order fix + invariant test
- Tibetan documentation corrected (Indic-phonetic, not Wylie)
Infrastructure
- API stability tests (133), mutation testing killers (92)
- CI restructured: 10× faster Python tests, path-filtered CodeQL, no duplicate runs
- Transliteration provenance documentation
docs/index.mdgenerated fromREADME.md(single source of truth)
See CHANGELOG.md for full details.