A C version of the Python translitcodec module
License
fazalmajid/ctranslitcodec
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
-*- coding: utf-8 -*- Unicode to 8-bit charset transliteration codec. This package contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). It is a C reimplementation of the [translitcodec][1] module by Jason Kirtland, as well as a drop-in replacement. The translation tables used by the codecs are from the [transtab][2] collection by [Prof. Markus Kuhn][3], with some extensions by [Fazal Majid][4]. Three types of transliterating codecs are provided: "long", using as many characters as needed to make a natural replacement. For example, \u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``ae``. "short", using the minimum number of characters to make a replacement. For example, \u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``a``. "one", only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, \u2639 WHITE FROWNING FACE ``☹`` will be passed through unchanged. Using the codecs is simple:: >>> import ctranslitcodec >>> import codecs >>> codecs.encode(u'fácil € ☺', 'ctranslit/long') u'facil EUR :-)' >>> codecs.encode(u'fácil € ☺', 'ctranslit/short') u'facil E :-)' [1]: https://pypi.org/project/translitcodec/ [2]: https://www.cl.cam.ac.uk/~mgk25/unicode.html#libs [3]: https://www.cl.cam.ac.uk/~mgk25/ [4]: https://majid.info/
About
A C version of the Python translitcodec module
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published