Parse text contents from common file formats
Switch branches/tags
Nothing to show
Clone or download
Latest commit 008f55b Nov 29, 2017


Gem Version

Grab the text from common document formats with 1 command. DocRipper is an extremely lightweight Ruby wrapper that can be used to parse text contents from common file formats (currently .doc, .docx and .pdf, .sketch) without the need for a large number of dependencies like an OCR library or OpenOffice/LibreOffice.

For simple parsing, you'll likely see a large performance improvement with DocRipper over solutions that rely on OpenOffice/LibreOffice for .doc/.docx conversion.

Need OCR support or in-image text parsing? Take a look at Docsplit.

Supported File Formats

File format Supported? Dependencies
.doc x Antiword
.docx x
.pdf x Poppler-utils
.txt x
.sketch x Sqlite3


  gem install doc_ripper

Specify a file path of a file

  require 'doc_ripper'


If the file cannot be read, nil will be returned.

  => nil

Want to raise an exception? Use #rip!

#rip! will raise an exception if rip returns nil or the file type isn't supported

  # invalid file type
  => DocRipper::UnsupportedFileType

  # missing file
  => DocRipper::FileNotFound