Extract Text from Microsoft Word Documents
Wraps the AntiWord utility to extract text from
Microsoft Word documents. The utility only supports the old
doc format, not the
new xml based
docx format. Use the 'xml2' package to read the latter.
Install the package directly from CRAN:
The function has only a single function
antiword(). It takes either a local
file path or a URL to a word document:
library(antiword) text <- antiword("https://jeroen.github.io/files/UDHR-english.doc") cat(text)
Universal Declaration of Human Rights Plain English Version All people everywhere have the same human rights, which no one can take away. This is the basis of freedom, justice and peace in the world. This Declaration affirms the dignity and worth of all people, and the equal rights of women and men. The rights described here are the common standard for all people everywhere. Every person and nation is asked to support the understanding and respect for these rights, and to take steps to make sure that they are recognised and observed everywhere, for all people.
This package is part of rOpenSci!