Skip to content

Commit

Permalink
Initial import
Browse files Browse the repository at this point in the history
  • Loading branch information
petewarden committed Mar 17, 2011
0 parents commit 37af16c
Show file tree
Hide file tree
Showing 108 changed files with 22,677 additions and 0 deletions.
72 changes: 72 additions & 0 deletions CODING.STD
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
CATDOC CODING STANDARD
~~~~~~~~~~~~~~~~~~~~~~
0. CATDOC ISN'T WRITTEN ON C++!!!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C and C++ are different languages.
No // comments, no references, no declaration in the middle of block.

1. Catdoc is portable program.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please never make following assumptions:
1. That int is more than 16-bit wide
(consequentually, that signed int can hold Unicode character)
2. That sizeof(int)>=sizeof(int *)
3. That int is always 16-bit (it can be 32 bit as well)
4. That long is 32-bit
5. That char (and int and short as well) is either signed or unsigned
Always use explicit signedness specifier
6. That integer arithmetic is 32-bit long.
7. That input is always seekable. Catdoc is often used as filter
8. That filenames are either case-sensitive or case-insensitive
9. That there is no difference between binary and text file opening mode
10. That opening file in the text mode will do something reasonable.
Always open files in binary mode. This is only way to produce
results, consistent on all platforms.
11. That you can rely on compiler POSIX or C99 compliance. If you need
to use some function defined by this standard, write configure test
and provide fallback.
12. That you can allocate chunk of memory larger than 64K.
13. That filenames can be longer that 8+3.

2. Catdoc is used world-wide
~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Never write comments on languages other than English.
2. Never assume that you can output character without passing it through
convert_char function.

3. Code formatting
~~~~~~~~~~~~~~~~~
1. Use <Tab> for identation. If your text editor insists on <Tab> being
8 char, consider using some other editor. vim is at least a bit more
portable than catdoc.
2. Open curly bracket on the same line as statement it belongs to:
if (condition) {
code
}
rather than
if (condition)
{
code
}

3. The only exeception from rule 2 are blocks in the switch statement:
switch (var) {
case value:
{
code
}
}
rather than
switch (var) {
case value: {
code
}
}

4. Write comments at the start of each function describing its purpose
and arguments.

5. If you use some potentially dangerous construct, such as sprintf on
static buffer, comment why it is safe in this particular case.

340 changes: 340 additions & 0 deletions COPYING

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions CREDITS
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Note: people listed in this file are listed in arbitrary order.
Kawai Takanori (Hippo2000) kwitknr@cpan.org
Author of perl module Spreadsheet::ParseExcel, which I use as
reference manual for Excel format
Alex Ott <ott@jet.msk.su>
Fixed handling of long SST, contributed handling of RK records,
wrote RTF and OLE parsers
Pawel Wiecek <coven@debian.org>
Current maintainer of Debian catdoc packag
Peter Novodvosky <nidd@debian.org>
maintained debian package for catdoc.
Bjorn Brenander <bjorn@debian.org>
maintained debian package for catdoc.
Eugene B. Byrganov <E.B.Byrganov@inp.nsk.su>
Suggested -l switch, found me an example of partly 8-bit/partly
16-bit file and some typos in builtin docs. Fixed some long-standing
bugs in config-parsed code.
Artem Chuprina <ran@ran.pp.ru>
Provided lot of bugfixes and suggestions. Also maintained some
unofficial packaged versions of catdoc.
Stephen Farrell <stephen@farrell.org>
maintains FreeBSD port, and have persuaded me to write autoconf
configuration
Martin Kraemer <martin.kraemer@mch.sni.de>
contributed some fixes for ascii.rpl and noted typo in catdoc.h
Arfst Ludwig <Arfst.Ludwig@LHSystems.COM>
give me the idea of creating README.charset
Dmitry Potapov <dpotapov@capitalsoft.com>
contributed rtf-parsing code
David Rysdam
Wrote program biffview, which parses XLS file and used as base
for xls2csv.
Duncan Simpson <dps@io.stargate.co.uk>
audited catdoc code for possible buffer overruns (and found much more
of them than actually existed)
71 changes: 71 additions & 0 deletions INSTALL
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
INSTALLING catdoc 0.91.x

Starting with patchlevel alpha 3 catdoc version 0.90 have autoconf
configuration. Thanks for Stephen Farell to convince me.

So typically you should run
./configure
make
make install

to compile and install catdoc.

NOTE for HPUX users. If you want to compile catdoc with aCC,
use CC="aCC -Ae" ./configure

Configure script for catdoc recognizes following options (apart from
standard --prefix, --exec-prefix and so on)

--disable-wordview - disables building of Tcl/Tk viewer wordview,
which requires X11. (note, it would be disabled automatically,
if you don't have appropriate version of Tcl/Tk). You may
wish to use this if you don't have X installed.

--with-wish=path - specifies path to wish interpreter. This option have
two uses
1. If executable named wish, found in your PATH is old, and
you have newer wish installed as wish4.2 or wish8.0,
you should specify this in order to build wordview viewer
2. If you are compiling catdoc from telnet connection or
text console, you can specify this option to skip tcl
version check, which would run wish and fail if it couldn't
find X display (which would lead configure to assume, that
you don't have good wish)

--with-input=charset
--with-output=charset
Allows you to specify charset names to expect in 8-bit word
file and to produce as output text file. Do ls ./charsets/*.txt
to find out which charsets are provided in distribution.
Additional charsets can be obtained from
ftp.unicode.org
Note that make would fail if you specify charset, which
doesn't exist in charset directory.

--disable-charset-check
By default, make in charsets directory fails, if it is unable
to find *.txt files corresponding to default input and output
charsets. This option allows you to disable this check. Make
in charsets directory would always succeed, but it is your
responsibility to provide charset files in catdoc library
directory after make install.
--disable-langinfo
By default, catdoc tries to use your current locale charset
as its output charset. It can be, of cource always overriden
by command line switch. But charset from the locale takes
precedence over charset in configuration file, unless
you put use_locale=no into this file.

If your C library is not XPG4-compatible, and configure fails
to detect it, you can completely disable langinfo support
using this switch.

If you experience strange and unexpected behavoir of catdoc, try to
remove optimization flag (-02) from FLAGS in src/Makefile.
If you can write autoconf test to check for this problem, please send it
to me.

It was known problem with version 0.35 on HP/UX 9, and I scarcely changed
my style of writing since.


76 changes: 76 additions & 0 deletions INSTALL.dos
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
INSTALLING catdoc 0.90a on MS-DOS system.

Surprise, but MS-DOS is native platform for this version of catdoc.
In difference of previous version, which was UNIX program, ported to
DOS, this one was developed under DOS on nine-years old 286 laptop
with Turbo C 2.0.

So, catdoc works perfectly well on MS-DOS systems.

Documentation can be found in files CATDOC.TXT and CATDOC.PS
(both produced by UNIX man command)

If you've fetched BINARY DISTRIBUTION, note following:

1. catdoc expect to find its system-wide configuration file
in the same directory as executable (and therefore require DOS
version 3 or above) If you wish to move charset and special char
maps to location other than default (charsets subdirectory of
directory, containing executable) you must have this configuration
file.

2. Any file name in configuration file can contain %s escape, which
would be substituted by directory of executable.

3. All configuration files can use either DOS or UNIX end-of-line
convention.

4. Per-user configuration probably wouldn't work. But try to define
environment variable HOME and put catdoc.rc file in directory,
pointed by it.

5. Catdoc uses DOS country information as specified by COUNTRY statement
in your configuration file to determine output encoding. This
settings have priority over settings in configuration files (either
per-user or system-wide). If it is not what you want, set
use_locale = no in the configuration file.

If you are insisting on COMPILING catdoc YOURSELF.
Please note that catdoc was compiled under DOS using Turbo C 2.01,
downloaded from http://community.borland.com/museum. You can get the
same one.

I've made some attempts to compile catdoc with Watcom C (16-bit),
but haven't completely socceeded. If you do, let me know.

1. With 16-bit compilier, use COMPACT memory model
If you are using Turbo C make -fmakefile.tc in src directory
should be enough. If you have to change anything in
the makefile.tc, please let me know.

2. If you are using compilier other than Turbo C /Borland C or
Watcom, you should take look on fileutil.c file and possible
add couple of #ifdefs here. If your succed with it, send me a
patch (or entire modified file, if you don't know how to make
a good unix-like patch).


3. With 32-bit compilier you are on your own. I don't think that
small utilities like catdoc should require extender or DPMI host,
so I've never tried to build 32-bit version of catdoc for DOS,
But if you mix buffer sizes from UNIX version and file-name
dependent defines from DOS, you should probably achieve good
results.

4. With Turbo C you'll need file getopt.c which comes with Turbo C
and unistd.h which is provided in compat directory.
Compile getopt.c and add it to cc.lib and put unistd.h in
your include directory. Later it might help you to port other
unix software. With other compilier you can also make use
of getopt.c in compat directory (which is from GNU), but I was
unable to make it work with Watcom 10.0

5. It is probably good idea to link wildargs.obj (or wildargv.obj)
with catdoc. I didn't do it myself becouse I use korn shell on
machine where I've developed catdoc, so I don't need to include
parameter expansion in program.
26 changes: 26 additions & 0 deletions Makefile.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

# Your C compilier and flags
SHELL = /bin/sh


all:
for i in src doc charsets; do\
(cd $$i; $(MAKE) all);\
done

install:
for i in src doc charsets; do\
(cd $$i; $(MAKE) install);\
done
clean:
for i in src doc charsets; do\
(cd $$i; $(MAKE) clean);\
done
distclean:
for i in src doc charsets; do\
(cd $$i; $(MAKE) distclean);\
done
rm Makefile config.*
dist:
$(MAKE) -C doc dosdoc
$(MAKE) distclean
67 changes: 67 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
0.90.1 Nov 26 1998
Top-level Makefile now uses $MAKE instead of make
fixed missing end-line escaping in wordview.tcl
All occurences of strcpy, strcat and sprinf investigated
to avoid buffer overflows.
0.90 Oct 29 1998
Fixed bug with charset names redeclared locally in main()
Fixed problem in configure with wish 8.0.3
Catdoc considered to be stable enough for release
0.90b5 Oct 14 1998
Fixed handling of 0x1F char (soft hyphen in Word 6.0),
now it is translated to 0x00AD (unicode soft hyphen)
Fixed permissions for manual page
Added --with-install-root configure arg to simplify
building of binary packages.
0.90b4 September 17 1998
Added proper configuration of library dir in wordview.
Added --disable-charset-check config option
Added 0x2026 symbol in ascii.rpl
Added more Windows codepages in distribution
0.90b3 September 11 1998
Added -x switch to simplify debugging of substitution maps
0.90b2 September 10 1998
Added some symbols is 0x2000-0x20FF range to substituton maps
These symbols occurs in cp1251 so they are frequently found
in Word files. Fixed some filename-handling problems in
wordview.tcl

0.90b1 September 8 1998
Added us-ascii.charset, fixed small bugs in confugre,
install is used for all installation files. Code is
considered stable enough to be beta.

0.90a3 September 7 1998
Fixed small bug in table handling, which caused catdoc to
output extra column delimiter just before row delimiter. Added
autoconf configuration. install is back, although not for
charsets

0.90a2 August 18 1998
version 0.90 was tested on BSDI and Solaris platform. Makefile
was rewritten to avoid use of highly incompatible
/usr/{ucb,bin}/install

0.90a1 August 13 1998
Catdoc undergone major rewrite. Now it has proper charset
handling, including UNICODE and runtime configurability.

0.35 - June 5 1998
Fixed bug with -s switch which prevents catdoc from returning
non-zero code when invoked on UNIX text file

0.34 - Apr 28 1998
Files now opened in binary mode thus allowing catdoc to work on
DOS and simular systems. All specs arrays now have terminating
NULL

0.33 - October 1997
Fixed missing terminating NUL in specs array, which caused
random seqfaults on Linux and many other systems, becouse
_specs_ is searched by _strchr_ fynction

0.32 - August 1997
First mayor public release, uploaded to CTAN. Tk interface
appeared, manual page was written. Unfortunately, this release
was buggy.

45 changes: 45 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
CATDOC version 0.93

CATDOC is program which reads MS-Word file and prints readable
ASCII text to stdout, just like Unix cat command.
It also able to produce correct escape sequences if some UNICODE
charachers have to be represented specially in your typesetting system
such as (La)TeX.

This is completely new version of catdoc, rewritten from scratch.
It features runtime configuration, proper charset handling,
user-definable output formats and support
for Word97 files, which contain UNICODE internally.

Since 0.93.0 catdoc parses OLE structure and extracts WordDocment
stream, but doesn't parse internal structure of it.

This rough approach inevitable results in some garbage in output file,
especially near the end of file and if file contains embedded OLE objects,
such as pictures or equations.

So, if you are looking for purely authomatic way to convert Word to LaTeX,
you can better investigate word2x, wvware or LAOLA.


Catdoc is distributed under GNU Public License version 2 or above.


Your bug reports and suggestions are welcome.

There is also major work to do - define correct TeX commands
for accented latin letters into tex.specchars file and commands
for mathematical symbols (unicode 20xx-25xx).


Contributions are welcome.

See files INSTALL and INSTALL.dos for information about compiling and
installing catdoc.

Catdoc is documented in its UNIX-style manual page. For those who don't
have man command (i.e. MS-DOS users) plain text and postscript versions
of manual are provided in doc directory
Victor Wagner <vitus@45.free.net>


Loading

0 comments on commit 37af16c

Please sign in to comment.