A Ruby gem to liberate content from the jail that is Word documents
Link to blog post here
- Open the file in Microsoft Word
- Select "File" -> "Save as Web Page"
- Hit save
doc = WordToMarkdown.new("/path/to/export.htm")
=> <WordToMarkdown path="/path/to/export.htm">
doc.to_s
=> "# Test\n\n This is a test"
doc.html
=> "<html>\n\n<head>..."
doc.doc
=> <Nokogiri Document>
- Paragraphs
- Numbered lists
- Unnumbered lists
- Italic
- Bold
- Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
- Implicit headings (e.g., text with a larger font size relative to paragraph text)
- Images
- Tables
- Nested lists
script/cibuild
The development version of the gem contains a lightweight server for converting Word Documents as a service.
To run the server, simply run script/server
and open localhost:9292
in your browser. The server can also be run on Heroku.
A live version runs at word-to-markdown.herokuapp.com.
You can also use it as a service by posting raw HTML to /raw
, which will return the raw markdown in response.