Command-line utility for converting Microsoft Word documents to text
Clone or download
Latest commit 37af16c Mar 17, 2011
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
charsets Initial import Mar 17, 2011
compat Initial import Mar 17, 2011
doc Initial import Mar 17, 2011
src Initial import Mar 17, 2011
CODING.STD Initial import Mar 17, 2011
COPYING Initial import Mar 17, 2011
CREDITS Initial import Mar 17, 2011
INSTALL Initial import Mar 17, 2011
INSTALL.dos Initial import Mar 17, 2011
Makefile.in Initial import Mar 17, 2011
NEWS Initial import Mar 17, 2011
README Initial import Mar 17, 2011
TODO Initial import Mar 17, 2011
acconfig.h Initial import Mar 17, 2011
configure Initial import Mar 17, 2011
configure.in Initial import Mar 17, 2011
install-sh Initial import Mar 17, 2011
missing Initial import Mar 17, 2011
mkinstalldirs Initial import Mar 17, 2011
stamp-h Initial import Mar 17, 2011

README

CATDOC version 0.93

CATDOC is program which reads MS-Word file and prints readable 
ASCII text to stdout, just like Unix cat command.
It also able to produce correct escape sequences if some UNICODE
charachers have to be represented specially in your typesetting system
such as (La)TeX.

This is completely new version of catdoc, rewritten from scratch. 
It features runtime configuration, proper charset handling,
user-definable output formats and support
for Word97 files, which contain UNICODE internally.

Since 0.93.0 catdoc parses OLE structure and extracts WordDocment
stream, but doesn't parse internal structure of it.

This rough approach inevitable results in some garbage in output file,
especially near the end of file and if file contains embedded OLE objects,
such as pictures or equations.

So, if you are looking for purely authomatic way to convert Word to LaTeX,
you can better investigate word2x, wvware or LAOLA.


Catdoc is distributed under GNU Public License version 2 or above.


Your bug reports and suggestions are welcome.

There is also major work to do - define correct TeX commands
for accented latin letters into tex.specchars file and commands
for mathematical symbols (unicode 20xx-25xx). 


Contributions are welcome.

See files INSTALL and INSTALL.dos for information about  compiling and
installing catdoc.

Catdoc is documented in its UNIX-style manual page. For those who don't
have man command (i.e. MS-DOS users) plain text and postscript versions
of manual are provided in doc directory
                    Victor Wagner <vitus@45.free.net>