Charset converter in C
Pull request Compare This branch is 21 commits behind pinard:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


.. role:: code(strong)
.. role:: file(literal)

README file for Recode

.. contents::
.. sectnum::


What is Recode?

Here is version 3.6 for the Recode program and library.  Hereafter,
Recode means the whole package, :code:`recode` means the executable
program.  Glance through this :file:`README` file before starting
configuration.  Make sure you read files :file:`ABOUT-NLS` and
:file:`INSTALL` if you are not familiar with them already.

The Recode library converts files between character sets and usages.
It recognises or produces over 200 different character sets (or about
300 if combined with an :code:`iconv` library) and transliterates files
between almost any pair.  When exact transliteration are not possible,
it gets rid of offending characters or falls back on approximations.
The :code:`recode` program is a handy front-end to the library.

The Recode program and library have been written by François Pinard,
yet it significantly reuses tabular works from Keld Simonsen.  It is an
evolving package, and specifications might change in future releases.

On various Unix systems, Recode is usually compiled from sources,
see the `Installation`_ section below.  On Linux, it often comes
bundled.  Recode had been ported to other popular systems.  See both
`contrib/README`__ and the `Non-Unix ports`_ section below, to find
some more information about these.

__ contrib.html

Reports and collaboration

Send bug reports to' . A bug
report is an adequate description of the problem: your input, what you
expected, what you got, and why this is wrong.  Diffs are welcome, but
they only describe a solution, from which the problem might be uneasy
to infer.  If needed, submit actual data files with your report.  Small
data files are preferred.  Big files may sometimes be necessary, but do
not send them on the mailing list; rather take special arrangement with
the maintainer.

Your feedback will help us to make a better and more portable
package.  Consider documentation errors as bugs, and report them
as such.  If you develop anything pertaining to Recode or have
suggestions, let us know and share your findings by writing at . You may also choose to directly
write at, yet be warned that such
correspondence is often visible for a while through the Recode Web site.

If you feel like receiving releases and pretest announcements for the
Recode package, send a message to
having, in its body, a line saying::

    subscribe recode-announce

If you rather want to participate actively in discussions, pretesting
and development for Recode, do just as above, but this time, use::

    subscribe recode-forum

Visit for releases or pretests, and related
files.  In particular, button ``Browse`` gives access to a weekly mirror
of the current unpackaged work files, while button ``Folders`` gives
access to saved or pending correspondence.

Please *do not* widely redistribute releases having a letter after the
version numbers, as these are meant for pretesting only, and might not
be stable enough for other usages.

Development plan

My plan has long been to end the 3.x series of this package, rather
aiming 4.0 as a major internal rewrite.  As there is still a long
way before 4.0 gets ready, and *especially* because some of my good
collaborators insisted that I do so, there will be a Recode 3.7.  That
release is meant to provide a selection of user-contributed patches.

For prototyping what Recode will become and experimenting new concepts
more easily, I created a subsidiary and standalone project named
Recodec, meant to receive the best part of my development efforts in
this particular area.  Once I'll be happy with the prototype, the plan
is to rewrite it from Python to C, somehow.  Visit the Web pages for
this `Recodec project`__ for more information and details.  For now at
least, new features go to Recodec only.


Notes for version 3.7-beta2

Here are a few notes related to the beta2 pre-test release for the incoming
Recode 3.7.  I publish it to ease later exchanges of patches with testers.

+ The name has been changed from Free recode to Recode -- as "Free" was
  a four letter word to some people :-). :code:`recode` (no capital) still
  names the executable program specifically, or the distribution archive

+ Recode does not itself include :code:`libiconv` anymore.  However,
  it uses an external :code:`iconv` library if one is available at
  installation time, like :code:`libiconv` or the one provided within GNU
  :code:`libc`.  The ``-x:`` option to the program, or a new flag to the
  library :code:`recode_new_outer` function, inhibits the initialisation
  and usage of :code:`iconv`.

+ The bug about loosing a few characters, here and there, when recoding
  big files in :code:`iconv` context, seems to have been corrected.  A
  patch for this problem has been floating around for years, but it was
  not solving all cases.

+ Recode installation now uses Python.  In particular, it creates
  file :file:`build/src/iconvdecl.h` from local ``iconv -l`` output.
  Recode testing through ``make check`` also needs what people
  :code:`python-devel`, providing C header files for Python and
  :code:`distutils`.  The :file:`Makemore` file has been merged within
  regular Makefiles and is not distributed separately anymore.

+ It is likely that new bugs have been introduced through the above
  changes.  In particular, not everything is cosy on the side of release
  engineering.  A few files are either spuriously remade, or remade late.
  I'm a bit surprised by the difficulty to get this right.

+ ``make check`` accepts a ``LIMIT=`` option, for limiting tests to one or
  a few cases.  See :file:`tests/Makefile` for more information.

+ PO files have been updated from the Translation Project.

Notes for version 3.7-beta1

The beta 1 pre-test release for the incoming Recode 3.7 has been made
available for those needing it right away.  While it solves some serious
bugs and portability problems, others are meant to be addressed only in
later pre-tests.  In particular, none of charset or surface issues, user
requests, and various suggestions appear in this pre-test, and will not
either in later pretests, until all real show-stoppers are solved first.
So this is in no way a candidate for a Recode 3.7 release.

The test suite is worth more comments:

+ The suite is very partial, and may not be thought as a validation
  suite.  Before it could be used to ascertain confidence, it would need
  much more tests than it has already.

+ Testing is notably more speedy than it used to be.  For example, the
  previous :code:`bigauto` test, which was not run by default because it
  ran for too long, is now executed within the standard test suite, once
  in non-strict mode, and a second time in strict mode.

+ It does not use Autotest anymore, but rather a home grown test driver
  much inspired from the Codespeak project.  The link between the test and
  the Recode library is established through a Pyrex interface, so you need
  to have :code:`python` and :code:`python-devel` installed first.

+ Beware that the Pyrex interface to the Recode library is only meant
  for testing, for now at least.  While you may play with it, it would not
  be wise relying on it, as the specifications might change at any time.



Simple installation of Recode requires the usual tools and facilities as
those needed for most GNU packages.  If not already bundled with your
system, you also need to pre-install Python, version 2.2 or better.  You
may get it from:

It is also convenient to have some :code:`iconv` library already present
on your system, this much extends Recode capabilities, especially in
the area of Asiatic character sets.  GNU :code:`libc`, as found on
Linux systems and a few others, already has such an :code:`iconv`
library.  Otherwise, you might consider pre-installing the portable
:code:`libiconv`, written by Bruno Haible.  You may get it from:

Getting a release

Source files and various distributions (either latest, prestest, or
archive) are available through:
File timestamps after checkout may trigger Make difficulties.  As a way
to avoid these, from the top level of the distribution, execute ``sh`` before configuring.  If you miss either :code:`sh` or
GNU :code:`touch`, try ``python`` instead.

Configure options

Once you have an unpacked distribution, see files:

  ===================  =======================================================
  File name            Description
  ===================  =======================================================
  :file:`ABOUT-NLS`    how to customise this program to your language
  :file:`COPYING`      copying conditions for the program
  :file:`COPYING.LIB`  copying conditions for the library
  :file:`INSTALL`      compilation and installation instructions
  :file:`NEWS`         major changes in the current release
  :file:`THANKS`       partial list of contributors
  ===================  =======================================================

Besides those configure options documented in files :file:`INSTALL`
and :file:`ABOUT-NLS`, a few extra options may be accepted after

+ Options ``--disable-shared`` or ``--disable-static``

  to inhibit the building of shared libraries or static libraries; the
  default is to always build static libraries, and to attempt building
  shared libraries if there is some known recipe for this.

+ Option ``--with-gnu-ld``

  to force the assumption that the C compiler uses GNU ld.

+ Option ``--with-dmalloc``

  to trigger a debugging feature for looking at memory management
  problems, it pre-requires Gray Watson's package, which is available as .

Maintenance tools

For simple modifications to Recode, you should not need special tools
beyond those usual for installing GNU packages.  However, if you modify
any :file:`.l` source file, Python and Flex are both needed for remaking

For more comprehensive modifications, you might need more tools.  If not
done already, make sure you have a copy of the packages listed in the
following table.  You may also choose to establish a link in your build
:file:`doc/` directory, as explained within :file:`doc/Makemore`.

  ================   ==========   ==========   =============
  Package name       Current      Minimum      Install after
  ================   ==========   ==========   =============
  :code:`autoconf`   2.61         2.12         :code:`m4`
  :code:`automake`   1.10         1.9          :code:`Perl`
  :code:`Flex`       2.5.33       2.5.4a
  :code:`gettext`    0.16         0.16
  :code:`Help2man`   1.36         1.020        :code:`Perl`
  :code:`libtool`    1.5.24       1.3.4
  :code:`m4`         1.4.10       1.4n
  :code:`Make`       3.81
  :code:`Perl`       5.8.8        5.005.03
  :code:`Python`     2.5.1        2.2
  :code:`tar`        1.17         1.12
  :code:`wget`       1.10.2
  ================   ==========   ==========   =============

The *current* version numbers just happen to be those used for
development, it is often likely that older versions would work just as
well.  The *minimum* version numbers were once acceptable, they might
not be anymore, this has not been verified; any updating information is

Installation hints

Here are a few hints which might help installing Recode on some systems.
Many may be applied by temporary presetting environment variables while
calling ``./configure``.  File :file:`INSTALL` explains this.

+ Compilation time

  Some C compilers, like Apollo's, have a hard time compiling
  :file:`merged.c`.  If this is your case, avoid compiler
  optimisation.  From within the Bourne shell, you may use::

    CFLAGS= ./configure

  But if you want to give a real hard time to your C optimiser on
  :file:`merged.c`, to get code that runs only a bit faster, merely

      CPPFLAGS=-DINLINE_HARDER ./configure

+ Smallish systems

  For 80286 based systems (do some still exist?!), it has been
  reported that some compilers generate wrong code while optimising
  for *small* models.  So, from within the Bourne shell, do::

      CFLAGS=-Ml LDFLAGS=-Ml ./configure

  to force large memory model.  For 80286 Xenix compiler, the last time
  it was tried a while ago, one ought to use::

      CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure

  Other systems have poor :code:`pipe`/:code:`popen` support or thrash
  heavily when processes fork.  In this case, just before doing
  ``make``, edit :file:`config.h` and ensure :code:`HAVE_PIPE` is
  *not* defined.

External pointers


+ IETF references

  + Character Mnemonics & Character Sets


    Keld Simonsen <>, 1992-06.

  + UTF-7 - A Mail-Safe Transformation Format of Unicode


    David Goldsmith <> and Mark Davis
    <>, 1994-07.

  +  UTF-8, a transformation format of Unicode and ISO 10646


     François Yergeau <>, 1997-10.

+ Various references

  + Unicode charset mappings


    The Unicode consortium makes available plenty of charset mappings
    for converting "legacy" charsets to Unicode.

  +  Normalisation et internationalisation: Inventaire et prospectives des
     normes clefs pour le traitement informatique du français.  (392p.)

     This is a report, written in French, discussing charset issues and many
     other topics as well.  Laurent Bourbeau <>
     and François Pinard <>, 1995-10.


+ Recode specific

  + ETL presentation

    In 1999, the organisers of the `m17n99 conference`__ in Tsukuba,
    Japan, were kind enough to invite me.  This has been for me a
    fabulous trip and experience, and I met many extraordinary people in
    there.  At the conference, I presented the Translation Project, and
    Recode.  The Recode `presentation slides`__ are available.

__ m17n99.html


+ :code:`libiconv`

  This comprehensive charset converter library revolves around Unicode,
  and support Asian encodings among many others.  Even Recode uses it!


  Bruno Haible <>

+ :code:`tcs`

  Here is the main recoding tool from the Plan9 project.


+ :code:`yuedit`

  This GUI editor handles many encodings, among which UTF-8.  It also
  installs uniconv, a recoding program, and uniprint, a printing tool.


  Gaspar Sinai <>, 1999-01.

+ :code:`ucs-fonts`

   These 6x13 fonts, covering Unicode characters besides the Asian sets,
   merely replace the Linux fixed 6x13 font.  Works nicely with yudit.


   Markus Kuhn <>, 1998-11.

+ :code:`MtRecode`

  This charset converter is oriented towards SGML text manipulation.  It
  may be freely downloaded for non-commercial, non-military use from:


  Pointer given by Jean Véronis <>, 1996-06.

+ :code:`sp`

  This quite nice SGML structure analyser contains internal C++ modules
  for handling many charsets.


  James Clark <>

+ :code:`b2c`

  This program is able to generate interpreted
  character dumps, but properly embedded within complete C header files.


  Jörg Heitkötter <>, 1997-11.

+ :code:`PyRecode`

  This wrapper provides Recode functionality to Python programs.


  Andreas Jung <>

  Also see:


Non-Unix ports

Please if you are aware of various
ports to non-Unix systems not listed here, or for corrections.  Please
provide the goal system, a complete and stable URL, the maintainer name
and address, the Recode version used as a base, and your comments.


  Juan Manuel Guerrero <> maintains this port,
  dated 2001-03 and based on Recode 3.5.  The following archives hold
  binaries, docs and sources respectively.


  See `contrib/DJGPP/README`__ in the Recode distribution for more
  information about compiling this port.

  __ /DJGPP.html

+ MSDOS (Gnuish)

  Darrel Hankerson <> maintains this port, dated
  1994-11 and based on Recode 3.4.  You get many GNU tools, not only
  Recode.  The GNUish project is described in :file:`gnuish_t.htm`.

  + (Germany)

+ OS/2 (using emx/gcc)

  Maintainer unknown (maybe Kai Uwe Rommel <>), dated
  1994-11 and based on Recode 3.4.