Skip to content

miyako/4d-plugin-doctotext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 

Repository files navigation

version platform license downloads

4d-plugin-doctotext

4D implementation of DocToText.

ss

Abstract

the goal of this project is to support legacy Microsoft Word documents with the .doc file extension.

  • wv can load and parse Word 2000, 97, 95 and 6 file formats.

  • wvware is a document converter that uses wv to import .doc files. the outout format includes .rtf, .txt, .tex, .pdf or .html. see unofficial mirror.

  • abiword is a word processor that uses wv to import .doc files. it has a command line interface and server mode, similar to OpenOffice, that can be uses as a document converter. wvware deprecated its own suite of converters in favour of abiword.

  • wv2 is the successor to wv. it depends on zlib, libgsf, libbz2, libxml2, libiconv and glib, which in turns depends on libffi and libpcre.

  • doctotext is a document converter that uses wv2 to import .doc files. additionally it uses libcharsetdetect, htmlcxx, libmimetic, minizip to support other input formats. the outout format is always plain text.

  • pthread-win32 nuget might not work, need to compile from source.

Features

extract plain text from various file types:

Syntax

status:=DocToText (document;options;attachments)
Parameter Type Description
document BLOB
options Object see below
attachments Array BLOB
status Object

Options

Property Type Description
xml Text parse (default) fix strip
table Text table (default) row col
url Text underscored (default) text extended
list Text * (default) or any string
verbose Boolean false (default)
fallback Boolean false (default)
format Text .doc (default) .rtf .docx .pptx .xlsx .fodt .fods .fodp .fodg .odt .ods .odp .odg .ppt .xls .xlsb .pages .numbers .key .html .pdf .eml