Transliteration from Unicode to US-ASCII and ISO 8859-2.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Build Status Coverage Status Maven Central

Unidecode is a Java port of Perl library Text::Unidecode that solves transliteration of an Unicode text to US-ASCII. This implementation is not limited only to ASCII characters, currently supports also ISO-8859-2 (aka Latin 2) and can be easily extended to more charsets (contributions are welcome).

Please note that this is just a quick and dirty method of transliteration, it’s not a silver bullet! Read a detailed description of it’s limitations from the original Text::Unidecode by Sean M. Burke.

How to Use

Transliterate to ASCII

Unidecode unidecode = Unidecode.toAscii();

unidecode.decode("České „uvozovky“");
>>> Ceske "uvozovky"

unidecode.decode("42 ≥ 24");
>>> 42 >= 24

unidecode.decode("em-dash — is not in ASCII");
>>> em-dash -- is not in ASCII

>>> Nan Wu A Mi Tuo Fo

>>> amidaniyorai

Transliterate to ISO-8859-2

Unidecode unidecode = Unidecode.toLatin2();

unidecode.decode("České „uvozovky“");
>>> České "uvozovky"


Unidecode unidecode = Unidecode.toAscii();


>>> K


Released versions are available in The Central Repository. Just add this artifact to your project:


However if you want to use the last snapshot version, you have to add the Sonatype OSS repository:

    <name>Sonatype repository for deploying snapshots</name>

Other implementations


This project is a fork of the unidecode written by 徐晨阳 (xuender).


This project is licensed under Apache License 2.0.

Character transliteration tables used in this project are converted (and slightly modified) from the tables provided in the Perl library Text::Unidecode by Sean M. Burke and are distributed under the Perl license.