Transliteration from Unicode to US-ASCII and ISO 8859-2.
Build Status Coverage Status Maven Central

Unidecode is a Java port of Perl library Text::Unidecode that solves transliteration of an Unicode text to US-ASCII. This implementation is not limited only to ASCII characters, currently supports also ISO-8859-2 (aka Latin 2) and can be easily extended to more charsets (contributions are welcome).

Please note that this is just a quick and dirty method of transliteration, it’s not a silver bullet! Read a detailed description of it’s limitations from the original Text::Unidecode by Sean M. Burke.

How to Use

Transliterate to ASCII

Unidecode unidecode = Unidecode.toAscii();

unidecode.decode("České „uvozovky“");
>>> Ceske "uvozovky"

unidecode.decode("42 ≥ 24");
>>> 42 >= 24

unidecode.decode("em-dash — is not in ASCII");
>>> em-dash -- is not in ASCII

>>> Nan Wu A Mi Tuo Fo

>>> amidaniyorai

Transliterate to ISO-8859-2

Unidecode unidecode = Unidecode.toLatin2();

unidecode.decode("České „uvozovky“");
>>> České "uvozovky"


Unidecode unidecode = Unidecode.toAscii();


>>> K


Released versions are available in The Central Repository. Just add this artifact to your project:


However if you want to use the last snapshot version, you have to add the Sonatype OSS repository:

    <name>Sonatype repository for deploying snapshots</name>

Other implementations


This project is a fork of the unidecode written by 徐晨阳 (xuender).


This project is licensed under Apache License 2.0.

Character transliteration tables used in this project are converted (and slightly modified) from the tables provided in the Perl library Text::Unidecode by Sean M. Burke and are distributed under the Perl license.