4D implementation of DocToText.
the goal of this project is to support legacy Microsoft Word documents with the .doc
file extension.
-
wv
can load and parse Word 2000, 97, 95 and 6 file formats. -
wvware
is a document converter that useswv
to import.doc
files. the outout format includes.rtf
,.txt
,.tex
,.pdf
or.html
. see unofficial mirror. -
abiword
is a word processor that useswv
to import.doc
files. it has a command line interface and server mode, similar to OpenOffice, that can be uses as a document converter.wvware
deprecated its own suite of converters in favour ofabiword
. -
wv2
is the successor towv
. it depends onzlib
,libgsf
,libbz2
,libxml2
,libiconv
andglib
, which in turns depends onlibffi
andlibpcre
. -
doctotext
is a document converter that useswv2
to import.doc
files. additionally it useslibcharsetdetect
,htmlcxx
,libmimetic
,minizip
to support other input formats. the outout format is always plain text. -
pthread-win32
nuget might not work, need to compile from source.
extract plain text from various file types:
status:=DocToText (document;options;attachments)
Parameter | Type | Description |
---|---|---|
document | BLOB | |
options | Object | see below |
attachments | Array BLOB | |
status | Object |
Property | Type | Description |
---|---|---|
xml | Text | parse (default) fix strip |
table | Text | table (default) row col |
url | Text | underscored (default) text extended |
list | Text | * (default) or any string |
verbose | Boolean | false (default) |
fallback | Boolean | false (default) |
format | Text | .doc (default) .rtf .docx .pptx .xlsx .fodt .fods .fodp .fodg .odt .ods .odp .odg .ppt .xls .xlsb .pages .numbers .key .html .pdf .eml |